Friday, January 22, 2010

Drools Inference and Truth Maintenance for good rule design and maintenance

Back in November I did a blog on inference and how it can be useful for rule authoring.
What is inference and how does it facilitate good rule design and maintenance

The summary of this was:
  • De-couple knowledge responsibilities
  • Encapsulate knowledge
  • Provide semantic abstractions for those encapsulations
For my JUG in Lille I extended this example by including truth maintenance, to demonstrate self maintaining systems.

The previous example was issuing ID cards to over 18s, in this example we now issue bus passes, either a child or adult pass.
rule "Issue Child Bus Pass" when
$p : Person( age < 16 )
then
insert(new ChildBusPass( $p ) );
end

rule "Issue Adult Bus Pass" when
$p : Person( age >= 16 )
then
insert(new AdultBusPass( $p ) );
end
As before the above example is considered monolithic, leaky and providing poor separation of concerns.

As before we can provide a more robust application with a separation of concerns using inference. Notice this time we don't just insert the inferred object, we use "logicalInsert":
rule "Infer Child" when
$p : Person( age < 16 )
then
logicalInsert( new IsChild( $p ) )
end
rule "Infer Adult" when
$p : Person( age >= 16 )
then
logicalInsert( new IsAdult( $p ) )
end
A "logicalInsert" is part of the Drools Truth Maintenance System (TMS). Here the fact is logically inserted, this fact is dependant on the truth of the "when" clause. It means that when the rule becomes false the fact is automatically retracted. This works particularly well as the two rules are mutually exclusive. So in the above rules if the person is under 16 it inserts an IsChild fact, once the person is 16 or over the IsChild fact is automatically retracted and the IsAdult fact inserted.

We can now bring back in the code to issue the passes, these two can also be logically inserted, as the TMS supports chaining of logical insertions for a cascading set of retracts.
rule "Issue Child Bus Pass" when
$p : Person( )
IsChild( person =$p )
then
logicalInsert(new ChildBusPass( $p ) );
end

rule "Issue Adult Bus Pass" when
$p : Person( age >= 16 )
IsAdult( person =$p )
then
logicalInsert(new AdultBusPass( $p ) );
end
Now when the person changes from being 15 to 16, not only is the IsChild fact automatically retracted, so is the person's ChildBusPass fact. For bonus points we can combine this with the 'not' conditional element to handle notifications, in this situation a request for the returning of the pass. So when the TMS automatically retracts the ChildBusPass object, this rule triggers and sends a request to the person:
rule "Return ChildBusPass Request "when
$p : Person( )
not( ChildBusPass( person == $p ) )
then
requestChildBusPass( $p );
end

Wednesday, January 20, 2010

Drools : Towards computerizing intensive care sedation guidelines:design of a rule-based architecture for automated execution of clinical guidelines

Towards computerizing intensive care sedation guidelines:design of a rule-based architecture for automated execution of clinical guidelines

"The aim is to close the gap in communication between the IT and the medical domain. This leads to a less time-consuming and error-prone development phase and a shorter clinical evaluation phase.

Methods: A framework is proposed that semi-automatically translates a clinical guideline, expressed as an XML-based flow chart, into a Drools Rule Flow by employing semantic technologies such as ontologies and SWRL."

Drools JUG Lille 21st of Jan

I'm going to be at the Ch’ti JUG tomorrow the 21st of January doing a Drools talk:
http://chtijug.org/rendez-vous-le-21-janvier-drools-avec-cylande-et-luniversite-de-lille-1/

Update: Geoffrey will do a Drools Planner talk too.

Monday, January 18, 2010

Drools - OSGi Ready!

I've spent some time getting Drools OSGi ready, which was harder than I expected, especially as OSGi is new to me. Last night I finally got OSGi Declarative Services working with Drools. When you combine this with our Spring work, http://blog.athico.com/2009/12/drools-spring-improvements.html, it's great timing for the recent Spring DM announcement, http://blog.springsource.com/2010/01/12/dm-server-project-moves-to-eclipse-org.

For those that don't know. OSGi is a dynamic module system for declarative services. So what does that mean? Each jar in OSGi is called a bundle and has it's own Classloader. Each bundle specifies the packages it exports (makes publicly available) and which packages it imports (external dependencies). OSGi will use this information to wire the classloaders of different bundles together; the key distinction is you don't specify what bundle you depend on, or have a single monolithic classpath, instead you specify your package import and version and OSGi attempts to satisfy this from available bundles.

It also supports side by side versioning, so you can have multiple versions of a bundle installed and it'll wire up the correct one. Further to this Bundles can register services for other bundles to use. These services need initialisation, which can cause ordering problems - how do you make sure you don't consume a service before its registered?

OSGi has a number of features to help with service composition and ordering. The two main ones are the programmatic ServiceTracker and the xml based Declarative Services. There are also other projects that help with this; Spring DM, iPOJO, Gravity.

Each of the Drools factories is now also available as a FactoryService interface. You can either have OSGi inject those into a pojo, or retrieve them yourself from OSGi. I'll cover injection here. The below example injects the KnowledgeBuilderFacotryService, KnowledgeBaseFactoryService and ResourecFactoryService into the TestComponent pojo.
<scr:component xmlns:scr="http://www.osgi.org/xmlns/scr/v1.1.0">
<implementation class="testosgi.TestComponent"/>

<reference bind="setKnowledgeBaseFactoryService"
unbind="unsetKnowledgeBaseFactoryService"
interface="org.drools.KnowledgeBaseFactoryService"
/>

<reference bind="setResourceFactoryService"
unbind="unsetResourceFactoryService"
interface="org.drools.io.ResourceFactoryService"
/>

<reference bind="setKnowledgeBuilderFactoryService"
unbind="unsetKnowledgeBuilderFactoryService"
interface="org.drools.builder.KnowledgeBuilderFactoryService"
target="(org.drools.compiler.DecisionTableProvider=true)" />
</scr:component>
The TestComponent will only be activated when all of the referenced services are available and injected into the pojo. You'll also notice the "target" attribute for the KnowledgeBuilderFactoryService. The reason for this is that OSGi DS has no built in way to declaratively say which optional services must be present to satisfy your component. As a work around I made any Drools service that has optional services set a property if/when the optional service is available. Filters can then be applied, via the target attribute, to make sure the Service is in a desired state before consuming it. And that is pretty much it :)

Getting there wasn't so easy. The first step was in automating the build and packaging. To automate the build I used Peter Krien's BND tool. I found that BND would only automate the maven transitive dependencies by embedding them, so I did this first. This built a single Drools jar with all Drools jars and dependencies inside it. This straight away triggered ClassLoader issue, forcing me to rework how the Drools ClassLoader framework is configured. The issue here was that Drools uses the ClassLoader that you provide it when compiling DRLs, that means any runtime class loading is resolved against the ClassLoader the user provides. Because of the way OSGi works if the user was to provide a ClassLoader from their bundle, that ClassLoader would not be able to see internal classes to Drools itself. This meant I'd have the Drools bundles giving me ClassNotFoundExceptions for classes in it's own Bundle. The answer was to make a CompositeClassLoader that takes the provided user ClassLoader and combines it with the ClassLoaders of the Drools bundle's.

With that now working the next issue was the monolithic bundle we now had. I first tried to separate drools-api, to give real api separation. This then triggered "split packages". This is one of those things that you wish the OSGi people would shout from the roof tops about to anyone hoping to be OSGi compatible in the future, as I had them all over the place. A split package is where you have the same package namespace used in different jars. drools-api and drools-core both have classes in the "org.drools" namespace. There is very little documentation on resolving split packages and the solution proposed in BND didn't seem to do anything for me. I read that I can use a "mandatory" setting with my exports, which should make my imports work, but I couldn't get that working either. Instead I moved away from "Package-Import" to "DynamicPackage-Import *" and "Require-Bundle", where I tied the Drools impl bundles to the api bundle and re-exported interfaces. This seemed to do the job, although the later are frowned upon, see here. "Require-Bundle" couples your bundle to a specific version, which means you aren't making the most of the more declarative nature of OSGi and "DynamicPackage-Import *" just sucks everything in, which apparently can lead to inefficiencies in OSGi, something called a "fan out". I have to admit this gets too low level for me, so if anyone wants to add more light on this in the comments, please do and I'll paste it into the end of this blog.

The next step was to split up my monolithic Drools bundles back to their original jars and not to embed their dependencies. Variations on the following for the Drools modules seemed to work for me:
<configuration>
<manifestLocation>META-INF</manifestLocation>
<instructions>
<_removeheaders>Ignore-Package</_removeheaders>
<Require-Bundle>org.drools.api;visibility:=reexport;bundle-version="${pom.version}"</Require-Bundle>
<Import-Package>!org.drools.*, *</Import-Package>
<Export-Package>org.drools.*</Export-Package>
<DynamicImport-Package>*</DynamicImport-Package>
<Bundle-Activator>org.drools.osgi.core.Activator</Bundle-Activator>
</instructions>
</configuration>
Because many of the Drools dependencies are not OSGi ready I turned to the Spring repository, which repackages many projects with OSGi ready manifests in a Maven consumerable repository.

The Activator element specifies the class to be called when each Bundle is loaded in OSGi. The Activator registers services and where optional services need to be tracked configures a ServiceTracker that updates the properties that the target attribute can filter on. This is the programmatic way to setup services in OSGi, compared to DS.

I'm now in the process of OSGi-ifying the other Drools modules and trying to make it more robust. Thanks to Peter Kriens and the people on the #eclipse and #osgi irc channels for their patience with my questions.

Monday, January 04, 2010

Rete and "True Modify"

The first commit for "true modify" is in, although we'll need to think of a better name for it. The branch doesn't fully compile, so you can't use maven. But it does compile enough to run examples that use just joins, not and exist nodes. Such as manners and waltz. And we've been extending the rete testing harness, http://blog.athico.com/2009/11/rete-dsl-testing-harness.html, to provide more thorough testing of these nodes. I'll blog this algorithm in more detail later:
http://fisheye.jboss.org/browse/JBossRules/branches/true_modify_20100104

The crux of it is that a modify no longer is a stateless retract+assert. I use the term stateless here as all state is lost in the retract and all state is recreated in the assert, it's not easy to know the state changes between these two in a stateful manner.

As an example of work arounds we have had to do, to determine those state changes, take the event model for activations. Drools has always provided activation normalisation, to make the events seem correct and for truth maintenance. When a modify happens we put all cancelled activations in a map, that happened as part of the retract, and remove all activations that are in the map that happaned as part of the assert. This way we can know what was really cancelled, stayed the same and added. While it creates a system users can more easily understand, it adds considerable overhead (about 10%) and complexity, to my knowledge Drools is the only PRD system that does this.

The new algorithm does not do two propgations, a retract + assert, it instead does a single modify propagation. This propagation applies the constraint and determines what to do and how to continue:
false before, true now = continue as assert
true before, false now = continue as retract
true before, true now = continue as modify
false before, false now = do nothing

Tradditional symetrical Rete implementations, such as in Drools 4, would not be able to do this http://blog.athico.com/2008/10/symmetrical-and-asymmetrical-rete.html. Because there is not enough state in the network to avoid the retract + assert. Drools 5.0 implements asymetrical Rete for tree based removal as mentioned in the Doorenbos's papers and based on the work of Gary Riley in Clips. In this algorithm every Tuple (Jess calls Tokens and Clips calls PartialMatch) knows which Tuples it was joined to and all resulting children, likewise each child knows it's parents. The implementation alone in Drools 5.0 was not enough to move straight to true modify and the data structures had to be changed, mostly around deterministic iteration. For perf Drools 5 and Clips would just reference the head of a list, we would iterate from the head and add to the head. For true modify the opposite node iterations must be in the same order as the child tuple iterations. To achieve that we need to keep a reference to both the head and the tail, we add to the tail and iterate from the head. Along with a few other additions that means we can now implement modify methods, as illustrated in the JoinNode modifyLeft

public void modifyLeftTuple(final LeftTuple leftTuple,
final PropagationContext context,
final InternalWorkingMemory workingMemory) {
final BetaMemory memory = (BetaMemory) workingMemory.getNodeMemory( this );

// Add and remove to make sure we are in the right bucket and at the end
// this is needed to fix for indexing and deterministic iteration
memory.getLeftTupleMemory().remove( leftTuple );
memory.getLeftTupleMemory().add( leftTuple );

this.constraints.updateFromTuple( memory.getContext(),
workingMemory,
leftTuple );
LeftTuple childLeftTuple = leftTuple.firstChild;

RightTupleMemory rightMemory = memory.getRightTupleMemory();

RightTuple rightTuple = rightMemory.getFirst( leftTuple );

// first check our index (for indexed nodes only) hasn't changed and we are returning the same bucket
if ( childLeftTuple != null && rightMemory.isIndexed() && rightTuple != rightMemory.getFirst( childLeftTuple.getRightParent() ) ) {
// our index has changed, so delete all the previous propagations
this.sink.propagateRetractLeftTuple( leftTuple,
context,
workingMemory );

childLeftTuple = null; // null so the next check will attempt matches for new bucket
}

// we can't do anything if RightTupleMemory is empty
if ( rightTuple != null ) {
if ( childLeftTuple == null ) {
// either we are indexed and changed buckets or
// we had no children before, but there is a bucket to potentially match, so try as normal assert
for ( ; rightTuple != null; rightTuple = (RightTuple) rightTuple.getNext() ) {
final InternalFactHandle handle = rightTuple.getFactHandle();
if ( this.constraints.isAllowedCachedLeft( memory.getContext(),
handle ) ) {
this.sink.propagateAssertLeftTuple( leftTuple,
rightTuple,
context,
workingMemory,
this.tupleMemoryEnabled );
}
}
} else {
// in the same bucket, so iterate and compare
for ( ; rightTuple != null; rightTuple = (RightTuple) rightTuple.getNext() ) {
final InternalFactHandle handle = rightTuple.getFactHandle();

if ( this.constraints.isAllowedCachedLeft( memory.getContext(),
handle ) ) {
if ( childLeftTuple != null && childLeftTuple.getRightParent() != rightTuple ) {
this.sink.propagateAssertLeftTuple( leftTuple,
rightTuple,
context,
workingMemory,
this.tupleMemoryEnabled );
} else {
// preserve the current LeftTuple, as we need to iterate to the next before re-adding
LeftTuple temp = childLeftTuple;
childLeftTuple = this.sink.propagateModifyChildLeftTuple( childLeftTuple,
rightTuple,
context,
workingMemory,
this.tupleMemoryEnabled );
// we must re-add this to ensure deterministic iteration
temp.reAddLeft();
}
} else if ( childLeftTuple != null && childLeftTuple.getRightParent() == rightTuple ) {
childLeftTuple = this.sink.propagateRetractChildLeftTuple( childLeftTuple,
rightTuple,
context,
workingMemory );
}
// else do nothing, was false before and false now.
}
}
}

this.constraints.resetTuple( memory.getContext() );
}
The first thing you'll notice is that this is about third more code than retract+assert together, and there are additional logic tests to determine the before and after states. This combined with a small overhead addition in the data structures means that actually we aren't reducing the executed code statements, but increasing. What you'll notice though is that if there are no state changes, true before and true now, unlike retract+assert it avoids a Tuple creation. So for large conflict sets, where none or only a small proportion of the set changes we get much less or even no object creation and thus reduced load on the GC. In the past large systems with millions of facts using gigabytes of memory have had GC problems, where Drools is creating objects faster than standard GC can keep up with, causing OOME. The answer then was to tune the GC, to make it run more aggressively and more often. For those systems true modify should hopefully be a real advantage. Waltz and manners are small applications and are only marginally faster, most likely only due to the removal of the activation normalisation.

But is it only large systems that will benefit? Not at all. Now that we have stateful modifications it opens up lots of new opportunities for optimisation. The biggest initial gain will be from the more functional programming aspects of Drools. When you use 'from' to nest and chain conditional elements and patterns you are using Drools in a functional way. Accumulates are like left folds, it iterates a set of data and produces a derived object which we filter with patterns or other conditional elements.

$p : Person( location == "london" )
accumulate( CashFlow( person == $p, type == "DEBIT", $v : value ).
sum( $v ) )
If the Cashflows are all inserted first, the accumulation is triggered by the insertion of a Person. What happens if we change a field on the person, but not the location? In the functional world changes in values for a field are known as side effects. With the traditional Rete approach we have no way of knowing if the side effect can impact the results of the function, more than that the results are wiped away during the retract. With the modify we still have the result, we can then determine if the modified object or field can impact the accumulation, if it doesn't we can use the result as is, no need to recalculate. Some applications can chain many accumulations in a single rule, the savings in performance here are orders of magnitudes, Drools Planner (was solver) is such an application that will make big gains from this.

Range optimisations, age > $v, normally use BTrees for indexing. If every modify is a retract+assert this would send the BTree rebalancing into over drive, negating any benefits. Now that we can avoid any unnecessary BTree manipulation, range indexing becomes a possibility for Join Nodes.

We can analyse a rule and determine for each object change, at the point that it enters a rule which later nodes depend on this object. We can either avoid unnecessary checks and just propagate to the next node, or just avoid propagation all together. This can all help reduce the amount of work done during the matching phase.

I'm not sure of any literature or other engines that implement this Rete enhancement, so if you know of anything please point me in the right direction.

For a bit of historical information on Drools. In early 2.0 beta releases and in the Drools 3.0 final release there were attempts at "true modify". The approach in Drools 3.0 meant that each Tuple used Maps and Sets to keep references to matches and children. It worked, but performance was not scalable as memory use went through the roof. Which is why Drools 4.0 returned to a more traditional symmetrical Rete implementation. It's been a long sought for goal, that we haven't managed to get right in the past, but feel we are finally there. So I should add, other implementations or literature that worked in scalable way :)