Tuesday, June 20, 2006

Real World Rule Engines

Here is an excellent article, introduction reproduced below, from our very own mailing list mentor Geoffrey Wiseman:
http://www.infoq.com/articles/Rule-Engines


For many developers, rule engines are buzzwords, or black boxes on an architectural diagram: something to be feared or admired from afar, but not understood. Coming to terms with this, is one of the catch-22s of technology:

  • It's difficult to know when to use a technology or how to apply it well until you've had some first-hand, real-world experience.

  • The most common way to gain that experience is to use an unknown technology in a real project.

  • Getting first-hand experience using a new technology in a production environment is an invaluable experience for future work but can be a major risk for the work at hand.



Over the course of this article, I'll be sharing my practical experience with rule engines and with Drools in particular to support in-market solutions for financial services, in order to help you understand where rule engines are useful and how to apply them best to the problems you face.

Why Should I Care?


Some of you will have already considered using a rule engine and will be looking for practical advice on how to use it well: patterns and anti-patterns, best practices and rat-holes.

Others haven't considered using a rule engine, and aren't sure how this is applicable to the work you're doing, or have considered rule engines and discarded the idea. Rule engines can be a powerful way to externalize business logic, empower business users, and solve complicated problems wherein large numbers of fine-grained business rules and facts interact.

If you've ever taken a series of conditional statements, tried to evaluate the combinations, and found yourself writing deep nested logic to solve a problem, these are just the sorts of entanglements that a rule engine can help you unravel.

Some of our more complicated financial services work, when rephrased in a rule approach, began to look markedly more comprehensible. Each step in converting procedural conditional logic to Drools business rules seemed to expose both more simplicity and more power at once.

Finally, if you're not convinced by the above, consider this: rule engines are a tool, another way to approach software development. Tools have their strengths and weaknesses, and even if you aren't making immediate use of this one, it's helpful to understand the tradeoffs so that you can assess and communicate applicability in the future.

Post Comment

Friday, June 02, 2006

Rule Execution Flow with a Production Rule System

Some times workflow is nothing but a decision tree, a series of questions with yes/no answers to determine a final answer. This can be modelled far better with a Production Rule System, and is already on the Drools road map.

For the other situations we can use a specialised implementation of Agenda Groups to model "stages" in rule engine execution. Agenda Groups are currently stacked, like Jess and Clips modules. But imagine instead if you could model linear Agenda Group execution – this is something I have been thinking about for a while to allow powerful and flexible modelling of processes in a Production Rule System. A successful implementation has clear advantages over two separate engines – as there is an impedance mismatch between the two. While there is little issue using a rule engine with workflow, using workflow to control linear execution of a rule engine will very suboptimal – this means we must seek a single optimal solution for performance sensitive applications.

Let’s start by calling these special Agenda Groups "nodes", to indicate they are part of a linear graph execution process.

Start rules don't need to be in a node and resulting target nodes will detach and evaluate once this rule has finished:

rule "start rule"
target-node "<transition>" "<name>"
when
eval(true)
then
// assert some data
end


The start rule and the nodes can specify multiple target nodes and additional constraints for those target nodes; which is explained later. The start rule can fire on initialisation, using eval(true), or it could have some other constraints that fire the start rule at any time during the working memory life time. A Rule Base can have any number of start rules, allowing multiple workflows to be defined and executed.

The start rule dictates the next valid target-nodes - only activated rules in these nodes can fire as a result of the current assertions. While the activated rules in other nodes will not be able to fire, standard rules and Agenda Groups will react, activate and fire as normal to changes in data.

A node rule looks like a normal rule, except it declares the node it’s in. As mentioned previously a node can contain multiple rules; but only the rules with full matches to the LHS will be legible for firing:

rule "rule name"
node "<name>"
when
<LHS>
then
// assert some data
end


There is an additional node structure, which the rules are associated with, and specifies the resulting targets:

node "node name"
target-node "<transition>" "<name>"
end


Target nodes are only allowed to evaluate their activated rules once the previous start rule has finished or the previous node is empty because it has fired all its rules. Once a node is ready to be evaluated, we "detach" it and then spin it off into its own thread for rule firing, all resulting working memory actions will be "queued" and assert at safe points, so Rete is still a single process. Once a node is detached the contained rules can no longer be cancelled, they must all fire – further to this no further rules can be added. All our data structures are serialisable so suspension/persistence is simply a matter of calling a command to persist the detached node off to somewhere.

As well as a rule specifying the LHS constraints for it to activate, the previous node can specify additional constraints. A rule can be in multiple nodes, so if two incoming nodes specify additional constraints they are exclusive to each other - in that the additional constraints of the non current incoming node will have no effect:

node "node name"
target-node "<transition>" "<name>" when
<additional constraints>
end
end


Further to this a node can specify multiple targets each with its own optinonal additional constraints. Sample formats are showing below:

node "node name"
target-node "<transition>" "<name>"

target-node "<transition>" "<name>" when
end

target-nodes "<transition>" "<name>"
"<transition>" "<name>"
"<transition>" "<name>"

target-nodes "<transition>" "<name>"
"<transition>" "<name>"
"<transition>" "<name>" when
end
end


Further to this we need additional controls to implement "join nodes" and to also allow reasoning to work with both the transition name as well as the node name.

This highlights the basics for linearly controlled execution of rules within a Production Rule system. It also means we can model any BPM process, as it’s now a simplified subset, but allow it to be done in a highly scalable way that integrates into very demanding tasks. Further to this we can still have standard agenda groups and rules that fire as a result of data changes. This provides for a very powerful solution that is far more powerful than the simple subset that most workflow solutions provide.

Post Comment

Thursday, June 01, 2006

What is a Rule Engine

Drools is a Rule Engine but it is more correctly classified as a Production Rule System. The term "Production Rule" originates from formal grammer - where it is described as "an abstract structure that describes a formal language precisely, i.e., a set of rules that mathematically delineates a (usually infinite) set of finite-length strings over a (usually finite) alphabet". Production Rules is a Rule Based approach to implementing an Expert System and is considered "applied artificial intilligence".

The term Rule Engine is quite ambiguous in that it can be any system that uses rules, in any form, that can be applied to data to produce outcomes; which includes simple systems like form validation and dynamic expression engines: "How to Build a Business Rules Engine (2004)" by Malcolm Chisholm exemplifies this ambiguity. The book is actually about how to build and alter a database schema to hold validation rules which it then shows how to generate VB code from those validation rules to validate data entry - while a very valid and useful topic for some, it caused quite a suprise to this author, unaware at the time in the subtleties of Rules Engines differences, who was hoping to find some hidden secrets to help improve the Drools engine. jBPM uses expressions and delegates in its Decision nodes; which controls the transitions in a Workflow. At each node it evaluates a rule that dicates the transition to undertake - this is also a Rule Engine. While a Production Rule System is a kind of Rule Engine and also Expert System, the validation and expression evaluation Rule Engines mention previously are not Expert Systems.

A Production Rule System is turing complete with a focus on knowledge representation to expression propositional and first order logic in a concise, non ambigious and declarative manner. The brain of a Production Rules System is an Inference Engine that is able to scale to a large number of rules and facts; the engine is able to schedule many rules that are elegible for execution at the same time through the use of a "conflict resolution" strategy. There are two methods of execution for Rule-Based Systems - Forward Chaining and Backward Chaining; systems that implement both are called Hybrid Production Rule Systems. Understanding these two modes of operation are key to understanding why a Production Rule System is different.

Forward Chaining is 'data-driven' and thus reactionary - facts are asserted into the working memory which results in rules firing - we start with a fact, it propagates and we end with multiple elegible Rules which are scheduled for execution. Drools is a forward chaining engine. Backward Chaining is 'goal-driven', we start with a conclusion which the engine tries to satisfy. If it can't it searches for conclusions, 'sub goals', that help satisfy an unknown part fo the current goal - it continues this process until either the initial conclusion is proven or there are no more sub goals. Prolog is an example of a Backward Chaining engine; Drools will adding support for Backward Chaining in its next major release.

The Rete algorithm by Charles Forgy is a popular approach to Forward Chaining, Leaps is another approach. Drools has implementations for both Rete and Leaps. The Drools Rete implementation is called ReteOO signifying that Drools has an enhanced and optimised implementation of the Rete algorithm for Object Oriented systems. Other Rete based engines also have marketing terms for their proprietary enhancements to Rete, like RetePlus and Rete III. It is important to understand that names like Rete III are purely marketing where, unlike the original published Rete Algorithm, no details of implementation are published; thus asking a question like "Does Drools implement Rete III?" is nonsensical. The most common enhancements are covered in "Production Matching for Large Learning Systems (Rete/UL)" (1995) by Robert B. Doorenbos

Business Rule Management Systems build value on top of an Rule Engine providing systems for rule management, deployment, collaboration, analysis and end user tools for business users. Further to this the Business Rules Approach is a fast evolving and popular methodology helping to formalise the role of Rule Engines in the enterprise.

For more information read the following two chapters from the manual:
Introduction and Background
Knowledge Representation

Post Comment