Sunday, May 18, 2008

Efficient Binary Protocol Marshalling for Drools Stateful Sessions

New code has been added, in trunk for a highly efficient binary protocol marshalling for stateful sessions. While Drools 4 supported standard serialisation of Working Memories, it used the default serialisation, this is both very slow and memory bloating. It also has the problem that it just serialises the user objects with it.

The new marshalling implementation allows for a strategy approach when handling the marshalling of user objects. You register different strategies for different name spaces. Currently we have two strategies, Serialised and Identity. Serialised just writes the user object to the stream via standard readObject/writeObject methods. Identity assigns the user object an id, from an int counter, and places it in a map and writes the id to the stream. Reading in uses that map to lookup the user object. Identity strategy is thus stateful, the write method can only be called once but read can be called repeatedly. Other strategies will be added, such as hibernate place holder support. Users can of course write there own.

So what does this give?
  • Session pause/resume, ideal for moving a session to a new server or restarting an existing server.
  • General snap shotting, backup. You could do a poor man's high availability by snap shotting the session at the end of each transaction. Later we will be extending this binary protocol to support journalling and also replication via clustering.
  • Pre-loaded session templates. If you repeatedly create stateful sessions with the same base set of data and the insertion time is annoying, now you can just populate the session once and write out to a byte[] and use that as a template for further session creations.
  • Long term persistence, needed for our process integration work.

You can see the MarshallingTest to look at the integration tests in place at the moment.

In integrationtests' SerializationHelper you can see the helper methods there demonstrating how it all works.

The DefaultMarshalling class uses the Identity strategy if it is not passed a strategy factory. The Identity strategy is used throughout the tests.

The read/write marshalling is maintained in two files InputMarshaller and OutputMarshaller, this will allow better handling of backwards compatiblity in the future.

It's still very alphaware, but please do start trying it. I need to tidy up the debug output and will do that soon, I'll probably make each read/write method take a String which will be written to the log when debug is on. It currently sorts a number of map/set data structures, this is for debugging so that testing can round trip for data loss, I will make sure that it can be disabled for production. The following types don't yet marshall, not sure how to handle date/time stuff at the moment:
Temporal Sessions do not marshall
Scheduled rules, duration attribution do not marhsall

Implementations have not been added for accumulate, collect, from and rightinputadapter.

We are just hooking up the process marshalling into this at the moment.



  1. Are there any plans to include JTA support? I started dreaming about having a fullblown JTA support out-of-the-box making it possible to have Drools as JDO back-end.

  2. Back in 2002 and 2003 I did something similar for JESS. Since JESS didn't have a way to save the working memory, I added the functionality. When I tested it, I found it to be impractical for large working memories. For small working memory and sessions with a little bit of data it worked fine. Little in this case was bout 30-40k rows of data.

    As a mechanism for fail over or disaster recovery, it wasn't practical. For those curious, I tried 2 ways. The first was object input/output stream, which is slow. The second was to store the facts as plain data and just re-assert it. Since JESS parser is fast, it was faster to just re-assert the data. The reason for this is simple, when you think about it. If you serialize the working memory out, you have to store the raw data and the working memory indexes. Having to read in both is IO limited. With a decent 1.4Ghz CPU it's faster to just pattern match again. The other technique I thought of, which was to do delta save/restore of the working memory. That approach turns out to be too complicated and not worth it. I started to prototype it and quickly realize that idea sucks.

    After a lot of testing, we made the application stateless. A better approach is to use something like coherence and replicate the working memory across 2 nodes. this way, restoring a session is much faster and much more fault tolerant.

    I've mentioned this before. Oracle did this with JESS and coherence in 2006/2007.