Wednesday, January 02, 2008

An Interview with Matthias Groch - Drools as a Research Platform

One of the goals of the Drools Team for the product (besides world domination ;) ) is to enable it as a research platform. We believe that there is still a lot to be researched and much knowledge to be created in the field, and providing a flexible and customizable engine to the community may both speed up the achievements of the researcher's goals as well as bring interesting new features for the product. Features that can also be further explored in new researches.

One of the researchers currently working with the Drools platform is Matthias Groch, a German student from Dresden University of Technology, under the supervision of Ph.D candidate Karen Walzer.

Matthias was kind to answer a few questions about him and the work he is doing at the moment and I think you all may find it an interesting read.

Enjoy.

Matthias Groch

  • Matthias, can you tell us a bit about yourself?
I'm 27 years old and study Computer Science at Dresden University of Technology in Germany (since October 2001). At the moment I'm writing my diploma thesis (i.e. the final project) about extending the Rete algorithm temporally and thus to support CEP. Besides university, I gained some practical experience in software modelling and development by working as student assistant at the Chair of Modelling and Simulation (projects for the printing machine manufacturer KBA and Airbus) and in external internships (IBM Germany, Integranova Spain). In 2004, I attended a semester abroad at Swinburne University of Technology in Melbourne, Australia, and from September 2006 to March 2007 I worked as intern at Integranova in Madrid, Spain. That's my last years in short.

  • What is your Thesis about? Why did you chose this subject? In what manner do you think it helps innovate the field?
At present, complex event processing (CEP) resounds throughout the land. Particularly in the manufacturing domain - partly due to the introduction of RFID technology in the last years - CEP is on the rise. A real-world manufacturing scenario, where all kinds of events have to be processed, was the trigger for my thesis. Together with Karen, who is my supervisor, we want to extend the Rete algorithm (which originally was developed only for static facts) in such a way that processing of "event facts" becomes feasible. This includes application of logical operators and time windows, which originates from the field of data stream management systems. In a second step, we intend to examine whether favourable characteristics from the original algorithm such as almost linear scalability can be preserved.
Originally, the plan was to extend rule-based systems based on the Rete algorithm for CEP. In the course of that, the theoretical foundations were supposed to be applied to an open-source rule-based system. Finally, the enhanced system was ought to be compared with existing CEP engines such as Coral8, StreamBase or Esper. Since this is a pretty comprehensive task and many issues need to be taken into consideration, the scope has been reduced and my thesis will "only" deal with (sliding) windows over incoming events.
So in the end, we will hopefully have found to which degree Rete is suitable for CEP and what adaptations have to be made. Moreover, I will have contributed to a rule engine which processes rules AND is able to do CEP - which would be quite innovative already ;). A long-term goal would be to beat - or at least compete with - established CEP systems.

  • What got you interested in Drools?
To be honest, up to the point when I began working on my thesis (which was in August), I've never heard of rule-based systems, not to mention Drools. Then, after having become familiar with rule-based systems in general, the search for a mature rule engine so that we could put our ideas into practice. Soon it was clear that we would use Drools; besides being open-source, it had the best documentation available (including lots of examples) plus comprehensive first-hand help. First of all, there's the great mailing list with precise feedback (within hours) from the core developers and the community. Moreover; there's the Drools blog which keeps interested people up-to-date about recent developments as well as plans and visions for the future; I soon became a big fan of it. Last but not least I appreciate the possibility to directly get in touch with the core developers in the Drools chat and thus get immediate response to my questions. All these features convinced me to pick Drools and facilitated the access to rule-based systems in general and Drools in particular.

  • You already contributed to the Drools project. What have you done so far?
It's not that much so far. Nevertheless, I try to support Edson with his ambitious plans to integrate CEP in Drools. I illustrated that for correlating complex events, point-based semantics are not expressive enough; with them, only three relations (after, before, equals) between two events can be defined. Instead, interval-based semantics (introduced by Allen) allow for events with a duration greater than zero; the number of possible relations increases to thirteen. After Edson had introduced a pluggable operator interface to Drools, I defined and implemented all the thirteen relations. Moreover, in order to test the implemented operators and to add support events with non-zero duration, I added an overloaded insertion method for (event) facts; in addition to the usual insert(), it is now possible to specify a duration as parameter. At the moment I'm doing some research on how to evaluate sliding windows efficiently. As a result of that, a corresponding implementation for Drools will follow.

  • What are your plans for after your Graduation?
I intend to be done with my thesis by April. Then, I would like to start working as software engineer. I would like to continue working in the field of rule-based systems. Since the two semesters abroad in Australia and Spain, respectively, have been such great experiences, I fancy working in a foreign country. Anyway, as long as it is an interesting project, I'm open to anything.

Matthias, thank you for your time and our wishes of a resounding success in your thesis.

Drools Team

5 comments:

  1. Matthias, you might want to read my entries on temporal pattern, temporal facts and temporal activations. Sounds like most of what I've already covered should address the things you are working on. If you want an invite to my blog, send me an email woolfel AT gmail DOT com.

    I've already implemented temporal facts, temporal activations and have a basic implementation of temporal pattern. I haven't finished temporal interval, but you can atleast read the description and the implementation algorithm. In terms of efficiency, I think the lazy evaluation design I describe in detail is going to be efficient. It doesn't use a timer and handles intervals in a lazy fashion.

    I also have a detailed description of temporal distance algorithm, which can be used to calculate the temporal distance of each object type in the ruleset. There's still a lot of work to be done in terms of extending RETE for temporal logic, but a lot of it is straight forward. hope that helps

    ReplyDelete
  2. One more comment for Matthias. Feel free to use anything you find useful in my blog. There's no need to cite my blog or me. In fact, if you use any ideas from my blog, I would prefer no citation.

    My goal is the advancement of business rule technology and RETE algorithm. I believe RETE can and should be extended to support temporal logic. I hope the entries save you some time in your research.

    ReplyDelete
  3. Hi Peter, thx for your comments. I already accepted your invitation to read your blog some months ago; since that time, I follow it. In fact, there are some really interesting posts. I will see whether there is something of use for me since I'm focusing on windows at the moment.

    Matthias

    ReplyDelete
  4. in the interview, you had some concerns about performance. generally speaking there's two ways to solve that issue for RETE implementations.

    The first is lazy evaluation. the second is queries. The challenge I see with Allen's semantics is that it may not detect inconsistencies. I re-read his paper last weekend for fun and I believe he states that in his paper. The other challenge I see with allen's semantics is the space trade-off. for trivial examples it should work, but for a complex ruleset, my gut tells me the cost will be impractical.

    After I re-read Allen's paper, I did some more researching and came across this abstract. I don't know if you've seen it before.

    http://www.islandone.org/MMSG/ttf/paragon.htm

    Here is an interesting paragraph from that paper.
    Early in Paragons development sets of algebraic mappings were found between any two intervals to produce a third interval (i.e. Plus Minus Cross-Product etc). With a set of relations and mappings it was assumed that a useful algebra could be integrated into Paragon. The Temporal Specification Interface (TSI) was developed to quickly specify complicated temporal expressions. Unfortunately a non-trivial algebralc group for time Intervals was not found; fortunately this lack of success didn t matter. During the development of TSI and the search for a temporal algebraic group the domain experts used Paragon concepts to simulate clocks and timers whenever they needed them. Upon closer examination It was found that Allen s relatlons could be implemented in Paragon at the Attribute-State-Event level so TSI was never Integrated into Paragon. In addition Paragon has the ability to do multi-interval comparisons which Ladkin showed to be infeasible when representing at the interval-relation level

    I haven't fully digested the paper, but one other statement was interesting to me.
    It turns out to be quite simple to set up very complicated timed conditional or asynchronous cycles (or non-cyclic temporal state changes) in the state transition graph of a Primitive

    If a ruleset doesn't contain cyclic temporal state changes, Allen's approach "may" work. For a large ruleset, the space requirement may still be high and could be impractical.

    Probably the most interesting part is using the approach described in the link to run simulation and optimize temporal problems. by that I mean this. Given the space requirements for allen's approach is high, it's best to divide the problem across several system to run a simulation. From the results, one should be able to detect which states are extraneous or inefficient. From that, the system can rewrite the rules. In a sense, it would become a temporal ruleset optimization engine.

    ReplyDelete
  5. I still use JBoss Drools in my master thesis on Grid Components on Service Component Architecture. I use it in order to handle cluster enviroment facts (throughtput,fault, etc.) for providing so software hot-swapping, ie the ability to replace or update a SCA component if some facts occur.
    Cheeers,
    Giorgio.

    ReplyDelete