Wednesday, February 20, 2008

Shadow Facts - What you always wanted to know, but was afraid to ask

Shadow Facts are (a necessary) evil. I'm sorry to tell you that. Their purposes in life are to increase memory usage, add method call indirections and cause all types of hassle for us poor rule engine users.

Well, having said that, I must also say that they have a side effect of keeping the Rule Engine Session (a.k.a. Working Memory) in a consistent state in the presence of non-tracked fact attribute changes. But that is minor of course. Did I mentioned I hate Shadow Facts?

This post is a follow up on a discussion we were having in the Drools mail list and tries to bring some light to the shadow (facts).

What problem are Shadow Facts trying to address?

Rules Engines usually reason over facts (in the form of data structures) that are defined, created and managed by their own API. This is what happens with the regular Jess/CLIPS templates, just to mention one example.

Such facts do not require any form of "shadow facts", because all the handling is done through the engine API and as so, it is completely traceable and controlled by the engine. There is no "unsafe" change.

Well, that was beautiful, but someone realized that it is a pain to integrate such engines with the common Java programs out there. It would be much simpler to directly use the application POJOs as facts in the rules engine. That would avoid a lot of bridging and copy operations between the application and the engine "worlds".

I'm sorry to say I don't know who first introduced the concept, or the whole history behind it. I'm just mentioning Jess as an example, but many other engines support that.

The idea of using POJOs as facts without the need to explicitly copy attribute values from POJOs to templates and vice-versa is wonderful, but it raises a problem: a POJO attribute may be changed by a simple method call. There is no need to use a traceable engine API to do the job and so, the engine would completely lose control over the reasoning process. Even worse, since the application may have references to the objects asserted as facts, the application may change the fact attributes while they are in the working memory and the engine would not know about it, causing all types of inconsistencies.

In Drools syntax, imagine a rule like:
rule "my simple rule"
when
Person( likes == "cheese" )
then
// give cheese to the person
end

Pretend that in a stateful session, after asserting a Person fact that likes "cheese" into the working memory (and consequently activating the rule above, but before firing it), the Person changes his mind and now it likes "chocolate". The application changes the likes attribute value, but the engine don't know about it, since it has no way to trace such changes. Consequently the engine would fire the rule anyway, what is clearly inconsistent. And this is just one simple example of the wide range of inconsistencies that may happen in such scenario.

Shadow Facts to the rescue

So, how can we allow the engines to reason over POJOs without incurring in the above described problems? This is critical for Drools that uses POJOs as its primary fact type, but is also a problem other engines (like Jess) have do deal with.

Shadow Facts are the most common way of solving this problem.

Shadow Facts are a "copy" of the fact attributes. As simple as that. So when you insert a POJO fact into the working memory, the engine will transparently copy the fact attributes to an internal structure and will reason over that structure instead of reasoning over the POJO directly. This way, if any non-traceable change happens to the POJO's attributes, the engine will be shielded by it and remain consistent. If the application wants the change to become "visible" to the engine, it just needs to notify the engine by calling the update() action. The engine will make sure the attribute changes will become visible only at a safe point.

This is the mechanism used by most engines I'm aware about, including Jess and Drools, although the implementation are certainly different.

How Drools implements Shadow Facts?

As mentioned before, the primary fact type for Drools is POJO. It means we not only reason over POJOs, but we use POJOs internally to represent the facts. To not break this paradigm and to allow the engine to transparently work with facts that have shadow facts enable and facts that have shadow facts disabled, we implemented Shadow Facts as lazy proxies.

So, when an application asserts a Person instance to the working memory, the engine will dynamically bytecode generate a PersonShadowProxy class that extends the Person class and keeps a lazy proxy of the required attributes. It also keeps a "delegate" attribute to the actual asserted fact (instance). The engine only sees PersonShadowProxy, but anywhere we expose the fact to the user, only the Person instance is exposed. This keeps the working memory consistent and safe from external changes. Although, it obviously creates overheads and eventual problems with final mutable classes, that can't be extended, but have mutable attributes. It is also not simple to deal with the Collections Framework, because Maps and Collections do not follow the Javabean Spec, but must also be shadowed.

Is there a way to disable Shadow Facts and still keep the working memory consistent?

The good news is: yes. It is possible to do that if you follow a set of best practices.

1. Immutable classes are safe: if you have immutable classes that you use as facts you don't need to do anything for them.

2. In your rules, only change fact attributes inside modify() blocks: both Drools supported dialects (MVEL and Java) have the modify block construct. Make sure all fact attribute changes are made inside modify() blocks and you are safe. You can find more info on the Java modify block syntax here. MVEL example is here.

3. In your application, use modifyRetract() and modifyInsert() around any attribute changes for your facts: this way, the engine becomes aware that attributes will be changed and can prepare itself for them.

// create session
StatefulSession session = ruleBase.newStatefulSession();

// get facts
Person person = new Person( "Bob", 30 );
person.setLikes( "cheese" );

// insert facts
FactHandle handle = session.insert( person );

// do application stuff and/or fire rules
session.fireAllRules();

// wants to change attributes?
session.modifyRetract( handle ); // call modifyRetract() before doing changes
person.setAge( 31 );
person.setLikes( "chocolate" );
session.modifyInsert( handle, person ); // call modifyInsert() after the changes

If you can make sure that all changes made to fact attributes happens in the ways described above, you can go ahead and completely disable Shadow Facts. You can do that by setting a system property:

java -Ddrools.shadowproxy=false ...

Or by using the API:

RuleBaseConfiguration conf = new RuleBaseConfiguration();
conf.setShadowProxy( false );
RuleBase rulebase = RuleBaseFactory.newRuleBase( conf );

Is Drools team thinking about improving that?

Yes. Mark is working on a new implementation for the next major release that completely removes the need for Shadow Facts in common use cases. Since this blog post is already huge, we will talk about this subject in another blog post.

Any questions, just talk to us in the mail list or in the IRC channel.

Happy Drooling,
Edson