Monday, July 29, 2019

Kogito, ergo Rules — Part 2: An All-Encompassing Execution Model for Rules

This is the second post of a series of updates on the Kogito initiative and our efforts to bring Drools to the cloud. In this post we delve into the details of rule units and show you why we are excited about them.

An All-Encompassing Execution Model for Rules

If you’ve been carefully scrutinising the Drools manual looking for new features at every recent release, you may have noticed that the term rule unit has been sitting there for a while, as an extremely experimental feature. In short, a rule unit is both a module for rules and a unit of execution—the reason why we are not calling them modules is to avoid confusion with JVM modules. In Kogito, we are revisiting and expanding upon our original prototype.
A rule unit collects a set of rules together with the description of the working memory such rules act upon. The description of the working memory is written as a regular Java class, with DataSource fields. Each data source represents a typed partition of the working memory, and different types of data sources exist, with different features. For instance, in the following example we used an append-only data source, called data stream.

Rules of a given rule unit are collected in DRL files with the unit declaration

Each rule in a unit has visibility over all the data sources that have been declared in the corresponding class. In fact, the class and the collection of DRL files of a unit form a whole: you can think of such a whole as of one single class where fields are globals that are scoped to the current unit, and methods are rules. In fact, the use of fields supersedes the use of DRL globals.
A rule unit is submitted for execution to a scheduler. Rule units may decide to yield their execution to other rule units, effectively putting them into execution. For instance:

But rule units may be also put in a long-running state. In this case, other rule units may be run concurrently at the same time; because DataSources can be shared across units, units can be coordinated by exchanging messages.
Consider the following example:

In a certain way, rule units behave as “actors” exchanging messages. However, in a very distinctive way, rule units allow for much more complex chains of executions, that are proper to rule-based reasoning. For instance, consider this example from Akka's manual:

As you can see, pattern matches in Akka are strictly over single messages. This is unsurprising, because actors process one message at a time. In a rule engine, we are allowed to write several rules, reacting upon the entire state of the working memory at the execution time: this significantly departs from a pure actor model design, but at the same time gives a great deal of flexibility in the way you may write the business logic of your application.

Data Sources

It is worth to spend a few words on data sources as well. The data source construct can be seen as both a partition and an abstraction over the traditional working memory. Different kinds of data sources will be available: full-featured data stores may support to add, remove and update values, allowing for more traditional operations over the working memory; while the more constrained append-only data streams would be easier to integrate with external data sources and data sinks, such as Camel connectors; such constraints would be also valuable to enable more advanced use cases, such as parallel, thread-safe execution and persisted shared channel (e.g.: Kafka) across nodes of an OpenShift cluster, realizing a fully distributed rule engine.
 

Kogito: ergo Cloud

The parallel and distributed use cases are intriguing, but we need to get there with baby steps. However, this does not mean that the first steps won't be as exciting in their own way.

For Kogito we want to stress the cloud-native, stateless use case, where control flow is externalized using processes and, with the power of Quarkus we can compile this into super-fast native binaries. This is why in the next few weeks we will complete and release rule units for automated REST service implementation.

In this use case, the typed, Java-based declaration of a rule unit is automatically mapped to the signature of a REST endpoint. POSTing to the endpoint implies instantiating the unit, inserting data into the data sources, firing rules, returning the response payload. The response is computed using a user-provided query. For instance, consider this example:

Users may post events using the auto-generated /monitoring-service endpoint.

the reply will be the result of the query. In our case:

Cloudy with a Chance of Rules

We have presented our vision for the next generation of our rule engine in Kogito and beyond. The stateless use case is only the first step towards what we think will be a truly innovative take on rule engines. In the following months we will work on delivering better support for scheduling and deploying units in parallel (local) and distributed (on Openshift), so stay tuned for more. In the meantime, we do want to hear from you about the direction we are taking.

The future of Drools is cloudy… and bright!


Share/Bookmark

8 comments:

  1. Edoardo, great post. That's nice to see how things are evolving in Drools model and realizing it matches the abstractions we built around it in our services.

    We have been working around a "rule unit" abstraction for a couple of years now (we actually call it rule group, despite the conflict with Drools own group), that groups datasources and rules. Most of our processing is stateless, what make things easier to manage, but it is clear for us that this abstraction is the way to go (easier to model for both business and dev).

    Besides the stateless units (used for credit policy), we also have a couple of stateful rules and datasources used for fraud prevention. Those are, for sure, our pain point for high availability today. We couldn't find a way to share events between different rule units in Drools 6 and 7, but the "append only" examples given in this post look like very promising. Do you believe this is the way to go for stateful scenarios? We will look to implement something similar in the Drools version we are using today.

    About the stateless and Openshift I can't say very much as it differs a lot from our setup. We use long running Spring Boot services over AWS.

    A little errata: I believe the source file linked after "Rule units may decide to yield their execution to other rule units, effectively putting them into execution. For instance:" is wrong. It should be IncomingEvent.drl.java, no?

    ReplyDelete
    Replies
    1. Hi Filipe, that's great to hear!

      If you'd like to share more about your use case it could be useful to drive our effort further. Append-only data sources (with immutable data) are most probably a good way to share values across different units, and it is probably the safest for stateful scenarios. We are still investigating the best way to realize and provide these features, so we would really love feedback from our end user base. The AWS use case is definitely something we want to keep in mind.

      Thanks for the erratum, it should be fixed now.

      Delete
  2. Today we keep 2 different KBs with different purposes.

    KB 1 - Credit Policy: uses stateless sessions and runs rules that are associated with one and only one loan application.

    We have different rules that are activated in different moments of the loan application workflow depending on a bunch of factors (eg: a "rule unit" B, and its datasources, will only be triggered if "rule unit A" have not reproved the same customer).

    Everything works great, except the differentiation of which rules are associated with which datasources. As we can't use agenda-groups (stateless session), we end prefixing all rules from a same "unit" the same so we can filter it (eg: GA_RULE1, GA_RULE2). From your post I understand that's something rule units would solve, but it would be nice to also work with stateless sessions.

    KB 2 - Fraud Prevention: uses stateful session to store all events and relate them in time windowed series. Works great and saved us a bunch of money already! :)

    The problem here is high availability. Today this service is a SPOF as we can't partition the events or something like: we need all events for every loan application.

    When this service is restarted all events are retrieved from a durable source (today, a database) and reinserted into memory.

    Your post gave us me an idea of using a database or another datasource to store all events (as we already do today) and act as the single source of events. We use Apache Camel for a lot of integrations and this is probably another point where it would suit well.

    ----

    About AWS: my main concern today is about running stateful sessions in HA without much trouble. Good support for AWS stores would be great, in this case.

    Also, not right now but it would be nice to easily run AWS Lambda to evaluate Drools rules. I know most of your work targets OpenShift but it would be nice to have AWS Lambda in the radar while thinking about extensibility.

    ----

    That's all for now. If there is any other detail that could help, please let me know. We work with Drools for a couple of years now and should implement jBPM in a near future, so it's really in our interest to be involved and help as possible.

    ReplyDelete
    Replies
    1. Hey Filipe, that's very interesting to us. You might want to subscribe to the kogito mailing list to give more details https://groups.google.com/forum/#!forum/kogito-development

      Delete
    2. Hi Filipe
      Have you tried using Jboss DatGrid to store distributed data in memory? It is supported on Openshift https://access.redhat.com/documentation/en-us/red_hat_data_grid/7.1/html-single/data_grid_for_o

      Delete
  3. Hi, no we have not. As we run on AWS Openshift is not our default choice, but we could use AWS Elasticache (Redis or Memcached) for the same purpose, for sure.

    We are implementing a sync strategy using database (fits our latency requirements right now), but the ideal solution would be smtg native do Drools, like: "Drools, use this storage instead of memory for this Kie Session" and everything would just work.

    ReplyDelete
  4. Great post!
    This works pretty fine, specially for stateless cases.
    Is there a way to mix statefull kie-sessions (in STREAM mode) with rule-units? As with Quarkus we have a default stateless mode that executes the whole cycle, specially for queries generated end points (load unitExecutor / Add facts/ ExecuteQuery ).
    Thank you
    Keep rocking

    ReplyDelete