Monday, February 22, 2010

Collection-oriented Match for massively parallel Drools 6

With multi-cores becoming ever cheaper the desire to push Drools into parallel processing is increasing. We've already added rulebase partitioning, which helps throughput for CEP type problems, but that doesn't solve the parallel matching.

I've followed ParaOPS5, but wasn't comfortable enough that the design would deliver universal speed improvements compared to the complexity it brings. With ParaOPS5 each partial match when propagated to a node for evaluation was submitted to a queue for evaluation as a "task". This produces something that is very fine grained, and as a node, or potentially the index in a node is a locking point, there is a lot of waiting around for very small units of work.

A while back I stumbled across this paper "Collection-Oriented Match by Anurag Acharya and Milind Tambe". The paper is well written and relatively easy to understand. Here it proposes instead of propagating the partial match once it's created, it instead stays in the node and produce all partial matches which are stored in a collection, it's this collection we then propagate. This propagated collection is submitted as a "task" to the queue. This allows for larger units of work, as more is done in the node itself. The approach is not without it's problems, particularly around left indexing, as partial matches in the same propagated collection could be in different indexes for the node.

However we feel that this shows a lot of promise and have decided to explore this as the underlying algorithm for Drools 6. We'll hopefully have a basic prototype working this summer, and then we'll have some ideas on advantages and disadvantages.

3 comments:

  1. Forall on steroids? :)

    Once you do this I think you'll have a way to apply map/reduce to rete. After that the sky's the limit for scalability. The condition match is a map with trivial reduce. The optional fragmentation in the consistency tests would allow you to chunk into manageable sizes and then the consistency test itself is just as map/reducable as a condition test.

    ReplyDelete
  2. Are these collections of ground terms or constraints/intervals?
    How dense?
    Determining the atomicity or population of a constraint WRT ground terms (i.e. collection of objects) may be time consuming.
    So, a constraint interval with 'roughness' or 'compactness' may be better for split/merge election.

    ReplyDelete
    Replies
    1. "Are these collections of ground terms or constraints/intervals?
      How dense?
      Determining the atomicity or population of a constraint WRT ground terms (i.e. collection of objects) may be time consuming.
      So, a constraint interval with 'roughness' or 'compactness' may be better for split/merge election."

      Our implementation is not the same as above. We have elected to use a tree graph representation for our tuples. This allows for fast removal. As such we can avoid any split/merge issues. We simply build a set of matches, as a linked list, and propagated that. Not if what you say above is still a relevant problem?

      Here is the join node. Notice it has an incoming set, and a result set. The result set simply become an incoming set for the child node. Ignore the staged set for now, it's related to node sharing boundaries:
      https://github.com/droolsjbpm/drools/blob/master/drools-core/src/main/java/org/drools/core/phreak/PhreakJoinNode.java

      I'll try and have some more details on the blog soon.

      Delete