Wednesday, February 20, 2008

Shadow Facts - What you always wanted to know, but was afraid to ask

Shadow Facts are (a necessary) evil. I'm sorry to tell you that. Their purposes in life are to increase memory usage, add method call indirections and cause all types of hassle for us poor rule engine users.

Well, having said that, I must also say that they have a side effect of keeping the Rule Engine Session (a.k.a. Working Memory) in a consistent state in the presence of non-tracked fact attribute changes. But that is minor of course. Did I mentioned I hate Shadow Facts?

This post is a follow up on a discussion we were having in the Drools mail list and tries to bring some light to the shadow (facts).

What problem are Shadow Facts trying to address?

Rules Engines usually reason over facts (in the form of data structures) that are defined, created and managed by their own API. This is what happens with the regular Jess/CLIPS templates, just to mention one example.

Such facts do not require any form of "shadow facts", because all the handling is done through the engine API and as so, it is completely traceable and controlled by the engine. There is no "unsafe" change.

Well, that was beautiful, but someone realized that it is a pain to integrate such engines with the common Java programs out there. It would be much simpler to directly use the application POJOs as facts in the rules engine. That would avoid a lot of bridging and copy operations between the application and the engine "worlds".

I'm sorry to say I don't know who first introduced the concept, or the whole history behind it. I'm just mentioning Jess as an example, but many other engines support that.

The idea of using POJOs as facts without the need to explicitly copy attribute values from POJOs to templates and vice-versa is wonderful, but it raises a problem: a POJO attribute may be changed by a simple method call. There is no need to use a traceable engine API to do the job and so, the engine would completely lose control over the reasoning process. Even worse, since the application may have references to the objects asserted as facts, the application may change the fact attributes while they are in the working memory and the engine would not know about it, causing all types of inconsistencies.

In Drools syntax, imagine a rule like:
rule "my simple rule"
when
Person( likes == "cheese" )
then
// give cheese to the person
end

Pretend that in a stateful session, after asserting a Person fact that likes "cheese" into the working memory (and consequently activating the rule above, but before firing it), the Person changes his mind and now it likes "chocolate". The application changes the likes attribute value, but the engine don't know about it, since it has no way to trace such changes. Consequently the engine would fire the rule anyway, what is clearly inconsistent. And this is just one simple example of the wide range of inconsistencies that may happen in such scenario.

Shadow Facts to the rescue

So, how can we allow the engines to reason over POJOs without incurring in the above described problems? This is critical for Drools that uses POJOs as its primary fact type, but is also a problem other engines (like Jess) have do deal with.

Shadow Facts are the most common way of solving this problem.

Shadow Facts are a "copy" of the fact attributes. As simple as that. So when you insert a POJO fact into the working memory, the engine will transparently copy the fact attributes to an internal structure and will reason over that structure instead of reasoning over the POJO directly. This way, if any non-traceable change happens to the POJO's attributes, the engine will be shielded by it and remain consistent. If the application wants the change to become "visible" to the engine, it just needs to notify the engine by calling the update() action. The engine will make sure the attribute changes will become visible only at a safe point.

This is the mechanism used by most engines I'm aware about, including Jess and Drools, although the implementation are certainly different.

How Drools implements Shadow Facts?

As mentioned before, the primary fact type for Drools is POJO. It means we not only reason over POJOs, but we use POJOs internally to represent the facts. To not break this paradigm and to allow the engine to transparently work with facts that have shadow facts enable and facts that have shadow facts disabled, we implemented Shadow Facts as lazy proxies.

So, when an application asserts a Person instance to the working memory, the engine will dynamically bytecode generate a PersonShadowProxy class that extends the Person class and keeps a lazy proxy of the required attributes. It also keeps a "delegate" attribute to the actual asserted fact (instance). The engine only sees PersonShadowProxy, but anywhere we expose the fact to the user, only the Person instance is exposed. This keeps the working memory consistent and safe from external changes. Although, it obviously creates overheads and eventual problems with final mutable classes, that can't be extended, but have mutable attributes. It is also not simple to deal with the Collections Framework, because Maps and Collections do not follow the Javabean Spec, but must also be shadowed.

Is there a way to disable Shadow Facts and still keep the working memory consistent?

The good news is: yes. It is possible to do that if you follow a set of best practices.

1. Immutable classes are safe: if you have immutable classes that you use as facts you don't need to do anything for them.

2. In your rules, only change fact attributes inside modify() blocks: both Drools supported dialects (MVEL and Java) have the modify block construct. Make sure all fact attribute changes are made inside modify() blocks and you are safe. You can find more info on the Java modify block syntax here. MVEL example is here.

3. In your application, use modifyRetract() and modifyInsert() around any attribute changes for your facts: this way, the engine becomes aware that attributes will be changed and can prepare itself for them.

// create session
StatefulSession session = ruleBase.newStatefulSession();

// get facts
Person person = new Person( "Bob", 30 );
person.setLikes( "cheese" );

// insert facts
FactHandle handle = session.insert( person );

// do application stuff and/or fire rules
session.fireAllRules();

// wants to change attributes?
session.modifyRetract( handle ); // call modifyRetract() before doing changes
person.setAge( 31 );
person.setLikes( "chocolate" );
session.modifyInsert( handle, person ); // call modifyInsert() after the changes

If you can make sure that all changes made to fact attributes happens in the ways described above, you can go ahead and completely disable Shadow Facts. You can do that by setting a system property:

java -Ddrools.shadowproxy=false ...

Or by using the API:

RuleBaseConfiguration conf = new RuleBaseConfiguration();
conf.setShadowProxy( false );
RuleBase rulebase = RuleBaseFactory.newRuleBase( conf );

Is Drools team thinking about improving that?

Yes. Mark is working on a new implementation for the next major release that completely removes the need for Shadow Facts in common use cases. Since this blog post is already huge, we will talk about this subject in another blog post.

Any questions, just talk to us in the mail list or in the IRC channel.

Happy Drooling,
Edson

16 comments:

  1. Shouldn't this be the default?

    Why do rule engines allow the changing of fact attributes? Webster defines a fact as a "thing done." You can change your opinion, but you can't change facts. Thus you shouldn't be able to change fact attributes.

    ReplyDelete
  2. If you have immutable objects you can turn off shadow proxies and not have to deal with them. Trying to keep your data models immutable is nice, but not always possible :(

    I'll post my email to the user mailing list, which gives a bit more information on the subject:

    Drools 4.x assert/retracts, like Jess, are symmetrical. When you assert the data it uses the constraints to determine the cross product joins which controls the propagation throughout the network. When you do a retract it uses those same constraints to determine the cross product joins to be able to propagate through the graph removing itself from the node memories where the assert propagation reached before.

    Now if we go back to standard Java when you have a HashMap with a key/value pair, if you put an object into the map and changed a field on the key that changes the hashcode/equals methods you will never be able to retrieve the object. It is the same for us, the working memory has lots and lots of hash maps.

    Now what happens when you modify a bean? a modify is actually a retract+assert so when you call update(...) after the field has changed how can the retract correctly determine all the nodes where the object is remembered? It can't as the information is no longer there to determine this. To get around this we have to shadow a bean, i.e. make a copy of all the bean's fields on insertion. When you call update() the engine still sees the old field values, even though they might have been updated on the source object. This allows the retract to always behave symmetrically to the assert. On the assert part of the update() we refresh the shadow proxy with the new field values.

    With Maps and thus nested objects shadowing becomes very hard. As we would have to shadow the entire map and then shadow evey object in that map. Which is far from ideal. This is also why with 4.0 you must be extremely careful with nested objects, which we recommend are immutable while the parent object lives in the network, shadowing an entire object graph is simply not practical.

    In Drools 3.0 we did not need to shadow as we hade a more complicated algorithm that maintained more references allowing for asymetrical behaviour - however I couldn't get the algorithm to perform with Jess like performance and it used a lot more memory. I dropped Drools 4.0 back to how Jess does it with shadow facts and symmetrical assert/retract. For Drools 5.0 I have re-written the algorithm and have managed to get asymettrical behaviour with actual performance increases, not losses. This is now in a branch and I hope to merge into trunk soon.

    After this the only reason someone would want shadow proxies is to protect an object that is currently in the working memory from field modifications leaving the network integrity invalid. i.e. if another thread changes the field on an object in the working memory and the wm is not correctly notified we have an object cached in a state that it no longer represents, this will make all join attempts invalid until the engine is notified and the object is re-propagated throughout the network. As 5.0 will no longer require shadow proxies by default we will propably remove the asm proxy implementation we have and instead just have an interface and leave the implementation of this to the user, probably using aop. This will simplify our engine, and help performance a little, so that we offload the solution to problem to those that need it.

    ReplyDelete
  3. There is a way to not use shadow facts or cache the attribute values, even for cases where the objects are mutable.

    If you use byte code modification to inject code into all the set method of the object so that all changes trigger an modify, you can do without the shadow. I thought of the idea back in 2003/2004 when I was thinking about Laddad's book.

    The other reason for shadow facts is it boosts performance. You'll see the same kind of improvement in Jrules and MS BRE. To achieve the same performance without using shadow facts is to do what OPS5 did, compile the nodes to access the attributes directly with the getXXX method.

    The downside with that approach is the compilation time is much higher and dynamically adding/removing rules takes a huge hit. It's similar to waiting for JSP pages to compile.

    If you combine both AOP and direct method access at all the nodes, you can completely by pass shadow facts, without having to worry about inconsistent behavior.

    For those interested, the early versions of OPS5 didn't have the distinction between shadow fact and object, since it was all just facts. It was later that Forgy experimented with directly accessing the attributes.

    In clips, if you use deffacts directly, there is no shadow, since the facts are not accessible outside of CLIPS.

    Art introduced shadow facts, if I am not mistaken. ART was written in C++ so they supports object oriented programming. In jamocha, I give users the option of asserting an object with or without shadow, which allows mixed mode.

    For me, direct method access and byte code modification is too big of a trade off. If the application needs absolute speed, I think it much better to use static analysis or optimization algorithms instead. If the engine gives the user the flexibility of asserting w shadow, the user can choose for themselves. Having shadows turned on/off globally also runs into limitations. I've had cases where half the data was static, but the other half was mutable.

    It's useful to be able to call setXXX method and have it automatically notify the engine and trigger a modify. That's one feature I wish JESS had, but so far it still doesn't have it. there's an old paper on OPS5 that talks about compiling the nodes to access attributes directly.

    ReplyDelete
  4. There's actually an old post going back to 2004 on JESS mailing list that mentions the byte code technique, but I don't have a link to it. On a related note, I used Castor to compile schemas to clases with java beans support for this reason. If all users follow good rule development practices, the need for shadow facts is reduced. With direct method access at the node level, performance and memory usage should improve by 20-30%.

    ReplyDelete
  5. yet more examples that mutation is of the Devil !

    ReplyDelete
  6. Microsoft's BRE suffers from the inconsistency issue. In fact, Charles young filed a bug with Microsoft related to how they handle modify. Instead of doing a retract, alter object and assert, they do something else. That results in incorrect behavior, so the fact doesn't get removed properly.

    JRules 6 already uses byte code modification to access the attributes directly. I don't know exactly how they do it, but that feature has been in JRules since 6.0 if I'm not mistaken.

    ReplyDelete
  7. Our engine has always been model independent, via the Extractor interface. That means it is up to the builder to determine how to read fields, both reflection and bytecode would be valid, the engine simply does not care. We generate direct bytecode getters with asm to ensure maximimum performance - we have done this since Drools 3.0. Drools 4.0.x introduced primitive support to avoid Object wrapping, and provides full co-ercion on all of the get methods. i.e. if you call getObject( object ) instead of getInt( object ) on an Extractor for a primitive it will wrap that and return Integer.

    so if you had Person( name == "bill" )
    That will generate and cache an Extractor instance

    PersonNameExtractor implements Extractor {
    Object getValue(Object object) {
    return ((Person) object).getName();
    }

    This allows for how we read the model to be independent on whether a user has turned on shadow proxies or not. Shadow proxies can be turned on/off global or for specific instances.

    Drools 5.0 will do away with the need for shadow proxies at least with regards to being able to retract an object when the field(s) have changed externally to the engine. Drools 4.0, like Jess, is symmetrical in that it must recalculate the joins to perform the retract and thus the need for the preserved data. Drools 5.0 will be asymmetrical and will not need to re-calculate the joins and thus is not impacted by any possible field changes. We tried this with Drools 3.0 and had performance problems, those are now addressed for 5.0 - and with performance gains :) So that will simplify most people's use cases.

    ReplyDelete
  8. I should explain what I mean by direct access. It's from past research by Dr. Forgy. Rather than have extractor or helpers, the nodes directly access the attribute. What that means the node doesn't need to use an macros or helper classes. In a generic implementation, the alpha node might have

    assertFact(Object fact)

    Using direct access, the assert method signature varies depending on the node.

    assertFact(Customer fact)
    assertFact(Address fact)

    For the joins, the node still needs to do a type cast, but it would avoid using a macro to access the value. If I find the old paper, I'll post on jamocha's SVN.

    If I'm not mistaken, JRules does runtime bytecode generation to optimize performance. Drools does it with the builder, so that means it's a compile time optimization right? Atleast that's my guess based on JRules documentation.

    In JESS and Jamocha, there isn't any type cast, since it's all deffacts internally. The cost is up front when the java object is converted to deffact.

    In the case of JESS, the Fact object isn't an interface, so it makes it hard for ernest to support non-shadowed fact.
    http://herzberg.ca.sandia.gov/jess/docs/71/api/jess/Fact.html

    In terms of doing modification in place versus doing a retract/assert cycle, most engines don't modify in place. if you've gotten it to work in place that's cool. it would be interesting for everyone to see what kind of gains it produces versus code complexity. The benefit of retract/assert is the design and code is much simpler. doing modification in place tends to result in more complex code, so in the long run it may have un-intended side effects. When one considers NOT, Exist, Collect, Accumulate, Temporal and Aggregate nodes, it may not be optimal in all cases to modify in place.

    I think one area of research which many people haven't focused on is RETE topology. I think there's still ways to optimize performance through optimizing the topology. From past experience fixing bugs in Jamocha, improving topology tends to have much greater gains than optimizing modify. I know that ernest has improved modify performance over the years, but they tend to be small.

    ReplyDelete
  9. Thanks much, Edson! This was a great post-- very informative.

    Rick

    ReplyDelete
  10. I just double checked from an old JRules paper. Here is what it says.

    3.7 Rete Tuning
    The Just-In-Time (JIT) byte code compilation feature for Rete mode can be used to optimize the evaluation of conditions of rules. When JIT is enabled JRules will call methods in the conditions of rules using generated byte code, rather than using the Java reflection APIs. The time to create an IlrContext from an IlrRuleset is increased when the JIT is enabled, however this is a one time cost for pooled IlrContext instances, and should not be of concern. The JIT can be activated whenever the rule engine has the security permissions to create a custom ClassLoader.

    This is just me guessing here, so it likely to be wrong. From that description JIT is a one time cost up front at rule compilation time. since it mentions a custom classloader, it sounds like it's doing something similar to drools extractors.

    for some reason I mis-interpreted jrules JIT and thought it was like java JIT, where code is modified at runtime. Silly me, not all JIT means the same, especially when it's used liberally.

    on the topic of modify, there's another alternative. In a normal bean that implements java beans property listeners, the set method notifies the listeners and pass the old and new value. An alternative to that is to do something like this.

    public void setName(String name) {
    Rete.retract(this);
    this.name = name;
    Rete.assert(this);
    }

    this is also an old idea. I forgot where I came across it, but it's been floating around since 2004 also. Of course, this is different than modifying in place.

    ReplyDelete
  11. It's not a modify in place, or what I call a "true modify", it's still doing a retract+assert, but the retract does not need to recalculate the joins, there are enough references that it can now just iterate down the references clearing out the memory. so the retract is protected from any possible field changes on the object being retracted. This makes the retract+assert process asymmetrical instead of symmetrical.

    ReplyDelete
  12. Engine notifications on a per field change is a bad idea, as it massively increases the amount of work you need to do. You need that done as part of a block modification, as we allow in MVEL.

    One idea, which the micro container does, is to use annotations to define the transaction boundaries. It will then determine which objects have changed in that boundary, typically a method, and do the notifications at the end of the method. Because of our new asymmetrical algorithm we can sill do the retract successfully after the field has changed.

    ReplyDelete
  13. handling modify in block is much faster for most cases, but there is a niche case where it is better. For most of the applications I've worked on, modify usually involved multiple attributes.

    In the case where modify is only 1 attribute, it's faster to just do the modify. The other thing is some times people do stupid things like call the set method with the same value. In JESS, it does the comparison before doing the modify to avoid doing useless work.

    The code that castor generates compares the old and new value before modifying. For some really simple use cases, the alternate approach of doing a modify directly in the set method "may" be faster, but honestly I don't think it is worth it.

    You'd end up with an implementation that is slow for most cases and only faster for a niche case. I mention it just for discussion purposes and definitely wouldn't recommend it.

    A lot of different technqies have been tried over the years. On a semi-related note, Dr. forgy tried using non-blocking hashmaps with parallel RETE several years back.

    ReplyDelete
  14. GemStone OODB used modified version of VM (in old days for Smalltalk, later for Java) that automatically marked "dirty" objects when field values got changed. Their approach was so much better from performance and transparency perspectives, it's a pity Sun did not make it a standard VM feature. The beneficiaries would be all persistence and UI frameworks, rule engines would get some boost as well

    ReplyDelete
  15. What happened with Shadow Facts and V5?

    ReplyDelete
  16. We have a new rete algorithm that no longer needs them while working with java classes. Do a search on asymmetrical rete.

    ReplyDelete