Sunday, May 06, 2007

Event Stream Processing, Complex Event Processing and Rules Engines (Edson Tirelli)

(content originally posted at my personal blog)

For those wondering why I finally decided to create my own blog, it all started in my last class of Non-Classic Logics at USP. The teacher was explaining about one of the temporal logics that were formalized, known as Logic US (that stands for Until/Since).

At that moment, something clicked and I remembered Mark commenting about supporting CEP/ESP as an extension to the Rete engine we have implemented in JBoss Rules. It seemed a perfect match: supporting temporal logic would give us all we need to also reason over stream of events. I wish it was that simple.

Yesterday I spent the day looking for information about possible research that might have been done and what is the state of the art in the field. I confess, besides having a reasonable knowledge of Rete, I have no experience in ESP/CSP.

As you can imagine, there is a huge amount of information about it in the internet and I couldn't find/read it all, but so far, the following is what I was able to gather (trying a top-down approach).

Event Driven Architecture

First thing I tried to understand is the terminology, the use cases and state of the art in the ESP/CEP field, and I ended up reading this excellent post by Brenda Michelson, where she provides an overview of the terminology and how they relate to each other. I will not repeat here all what she says, but I found specially interesting the Event Driven Architecture she describes there. From it, I was able to define the boundaries of my current research.
What I'm interested is in how we can leverage the features of a rules engine, more specifically a Rete implementation, to handle CEP and ESP. In other words, from the reference architecture she mentions, I'm interested in the Event Processing Engine component.

ESP and CEP

ESP and CEP have quite a lot of information published. From David Luckham's excellent site, to university research projects, to products developed to do it, like Esper and StreamCruncher. A very prolific field.

Rules Engines

Besides all that information, I found just a few mentions about leveraging Rete to do it. James Taylor has three posts about it ([1], [2], [3]) and they are all 1+ years old.

I found a recent post (April, 2007) in a blog by Ashwin Jayaprakash where he addresses the issue. In his words:

The tenets of Event Processing

A facility to consume the state(s) of a System as discrete Events, a language or User Interface to specify the conditions/properties to look for in those Events from one System or across Systems (Co-relation) and provide actions to be performed when such situations arise.

It is no accident that Event Processing looks like a cross between Database Query processing and Rule Engines. There is no denying that the technology evolved from existing technologies. However the distinguishing characteristics of Event Processing technology are - the constructs it provides to specify Time based boundaries, fixed size moving Windows, Aggregation and Partitioning/Grouping Events - on a stream of incoming Events and a facility to continuously scan these Events for telltale signs, all this expressed as a concise Query.

Database Triggers which in turn execute SQL/Stored Procedures every time an Event/Row is inserted can provide rudimentary Event Processing functionality. Rule Engines can be used to discern information from Facts being asserted. But none of these technologies alone provide a concise and yet very expressive Domain Specific Language along with a platform, specifically built to constantly monitor/process live streams of Events.


I specially agree with Ashwin's statement that a Rules Engine as we know them, alone, can not provide a concise and yet expressive DSL to handle ESP and CEP. Although, I prefer to think (I guess in a similar fashion to what James Taylor says) that we can leverage a Rete Engine to address such requirements. We all know that Rete Engines excel at pattern matching tasks, so basically what is missing is the capability to handle streams of events where just some of them are really "interesting" for the processing (a "select"-kind feature) and even those selected events get old quickly and must be discarded (sliding window and temporal logic support) in a way to scale the solution. The expressiveness of the language I guess is a consequence of the addition of the mentioned features.
This may be an over simplification of the subject, but that is how I see it so far.

Current Approaches

The few current approaches I was able to read about since yesterday are extensions to the relational engines to handle temporal constraints and extensions to the SQL language to allow for querying such engines (I didn't read documentation about proprietary solutions like Coral8 and StreamBase yet).

  • StreamCruncher for instance implements and engine that is backed by a relational database and leverages it to handle ESP/CEP.
  • Esper on the other hand, seems to implement its own relational engine, but the language is also an extension of SQL.
While both approaches seems good to me, it itches me even more to understand how a Rules Engine approach would do in comparison with the existing ones.

Floyd Marinescu posted last year an article to InfoQ talking about the Esper release where he says that "The rules engines proved cumbersome and slow for this use case". He also quotes Alexandre Vasseur:

[Esper] is not to be confused with classical rules engines...The internals of Esper are made up of fairly complex algorithms primarily relying on state machines and delta networks in which only changes to data are communicated across object boundaries when required.

On the proprietary software side, I found a post about TIBCO, stating it uses "a modern compiled variant of the Rete algorithm for class-leading performance". It only mentions CEP though and I'm not sure it is used as a synonym for ESP or not.

In this comment, Daniel speaks about the CEP capabilities of ILog JRules, but not about ESP.

Given all the mentioned above, I guess that in order to believe, I need to see it myself.

Event Stream Processing Requirements

From my readings, I was able to draft a list of requirements, that in my opinion, a Rete engine would need to comply to be able to handle ESP and CEP tasks. They would be:

  • Support for events as a special type of facts: while facts are long living pieces of information, events have special temporal requirements and the engine must be able to work with both at the same time, inside the same working memory.
  • Support asynchronous multi-thread feeding: events may arrive at any time and from multiple sources (or channels), so the engine must support asynchronous, multi-thread feeding of events.
  • Support event selection: only a subset of the events going through the event stream are interesting for a given set of rules. The engine must be able to identify (select) such events and ignore the others.
  • Support events garbage collection: events grow old, quickly or slow, but they do grow old. The engine must be able to identify the events that are no longer needed and dispose them as a way of scaling. It is simply impossible to hold events indefinitely. Ideally, the identification and disposal would be automatically, using the results of rules compile time analytics.
  • Support temporal relative constraints: one way of defining the window of interest in a stream of events is do define patterns of events that start and close each window. Such windows would be defined through the use of temporal constraints with operators like "since" and "until". One example of a rule using such constraints would be: "Since the stock price went under US$ X, watch for any buyers of more than 1% of total stocks, until the day is over". The above rule will start to watch for any buyers when the stock price drops under a threshold, but once it happens, will continue watching them until the end of the day, even if the stock price goes over the threshold again.
  • Support temporal absolute constraints: the rules engine must implement a pseudo-clock, that may or may not (depending on the use case) be synchronized with the real clock and allow for constraints to be defined over the sliding time window relative to the pseudo-clock. Events outside the time window are old and should be discarded.
  • Support reasoning over absence of events: the engine must support the definition of rules that will fire over the absence of events in the expected time-frame (relative or absolute). Example: "If the temperature goes over the threshold and no contention measure is taken within 10 seconds, then sound the alarm".
  • Support to common mathematical accumulations of the events in the streams, like average, max, min, etc.

The whole event matching and correlation requirements are already provided by the rules engine. The main question though is:

Is it possible to implement all the above functional requirements as an extension to a Rete engine and still comply to the non-functional requirements, specially performance?

That is what I will try to understand over the next few weeks. Stay tunned and please, I do appreciate feedback, suggestions and critics.

22 comments:

  1. Rules engines are but one way to process events, either CEP or ESP related processing.

    For emerging discussions on a functional reference architecture for CEP, please visit:

    TIBCO's CEP Blog:

    http://tibcoblogs.com/cep

    ReplyDelete
  2. Edson,

    Thanks for calling out my post and good luck with your research. The comment from “BM” that appears above this, seemingly attributed to me via the link to my site, wasn't actually from me. Guess someone really wants you to look into Tibco!

    Brenda

    ReplyDelete
  3. Anonymous BM:

    Thank you for the link. I will read a bit more about the TIBCO approach to ESP/CEP.

    Brenda Michelson:

    Thank you for sharing your knowledge about the field. It was really helpful.

    Regards,
    Edson

    ReplyDelete
  4. Edson, you may want to spell my name correctly in your post: Alexandre Vasseur.

    On the if Rete fits CEP/ESP I think making time and causality a first class citizen is likely to deeply impact any Rete algorithm implementation. Let's left aside clustering and near real time performance requirements, or joins to relational database and continous joins. There has been extensive research in the CEP/ESP field and also around Rete and I'd tend to argue if researchers haven't come up to a common implementation so far, this is likely because some walls were hit.
    Both play a key role in an EDA but I don't believe a one size fits all there.

    ReplyDelete
  5. Alexandre,

    I apologize for the mistake. I just corrected your name.

    And yes, I see that ESP/CEP is indeed a prolific field, but I found almost no documentation in the convergence of ESP/CEP and Rete. It may be the case that I'm not looking at the right places or that as you mention, there is no "one size fits all" solution for this case.

    Whatever is the case, as a "curious" (beginner) student of the field, I would like to find more references or results from the analysis of the weakness of Rete for such use cases.

    I also want to take a closer look at Esper and the results it achieves.

    At this point, what bothers me is that I feel I can't argue neither in favor nor against using a modified Rete for ESP/CEP. :) I need to dig deeper and learn a lot more before I reach that stage. Any references you can point me to is more than welcome.

    Regards,
    Edson

    ReplyDelete
  6. I've been studying RETE for the last 5-6 years. From my experience and understanding, adding temporal support in RETE is fairly straight forward. In fact, on my blog I describe several ways of implementing it and the pros/cons. iLog JRules provides support for event processing. JESS provides runUntilHalt for event driven architecture. I think there's plenty of proof showing event process and RETE work well together.
    When one considers EVA (event condition action) rules, CEP isn't all that different. Having said that, RETE isn't a one size fits all. I can think of plenty of cases where RETE is not appropriate.

    Depending on the use case, one could adapt RETE to produce an optimized discrimination network for CEP.

    ReplyDelete
  7. thought I'd add a bit more details. There's basically 3 simple methods of customizing RETE for event processing.

    1. make RETE memory-less. in other words no memory for alpha or beta nodes. the real downside is that rules with joins will perform poorly. the only benefit is much lower memory usage.

    2. remove alpha memories and make the terminal node check if a fact has expired. there's a description of this on my blog.

    3. remove the agenda and fire all rules immediately. this is what JESS does with runUntilHalt.

    there are many reasons why there isn't a consensus on the "best" way to extend RETE for event processing. I've read several papers on RETE and event processing. My bias interpretation is that event processing varies significantly, therefore one should customize the discrimination algorithm for the use case.

    I don't think consensus is necessary or even useful. that's a straw man argument in my mind.

    ReplyDelete
  8. Hey Peter,

    Thanks for sharing your thoughts with us. For the ESP part, I think that your suggestion number 2 is specially interesting.

    Regards,
    Edson

    ReplyDelete
  9. Totally fascinated as to why someone would start with the idea that RETE is a good starting point for a CEP engine. Why not just start with a SQL query processor, or a Java App Server architecture and go from there?

    The reason the leading CEP vendors chose not to start with these other architectures is that event processing requires a new way of thinking that deals with flowing data; RETE is a rules-based approach designed to operate on static data sets - an insurance claim application, for example, where forward-chaining logic is used to validate that the form is filled out properly. Sure, you could start that way, but why?

    In the event processing blog , we have discussed these issues in depth, and cited a wide range of use cases that demand a different type of tool, different type of architecture, and a different type of development paradigm that is built on an event-driven approach to processing information; rather than a warmed-over approach from the architectures of yesteryears.

    - Mark Palmer, General Manager, Progress Apama.

    ReplyDelete
  10. Mark,

    First of all, thanks for your posts on The event processing blog. Great content in there and I will surely dig for more.

    About your question, it is the same as mine: "why someone would start with the idea that RETE is a good starting point for a CEP engine?" ;)

    Hopefully I will be able to answer it in a couple months time.

    ReplyDelete
  11. "The reason the leading CEP vendors chose not to start with these other architectures is that event processing requires a new way of thinking that deals with flowing data; RETE is a rules-based approach designed to operate on static data sets - an insurance claim application, for example, where forward-chaining logic is used to validate that the form is filled out properly. Sure, you could start that way, but why?"

    I'd like to address a myth in the statement. RETE is first and foremost a pattern matching algorithm. It's used in rule engines and expert systems, but there's nothing inherently rules about it. It's just a way to compile if/then logic into optimized relational joins. Second misconception is that RETE is not predicated on "static data". The whole point of expert systems is to address dynamic environments where new data, object types and rules are created at runtime. RETE came out of the need to address very dynamic situations where a system needs to adapt to changing events. I would suggest doing a search on ACMQueue to get a better picture of how RETE is used in research and in business environments.

    One of the major areas where RETE is used is for real-time trading systems, which has a constant stream of messages (aka transactions) going in and out of the system. Don't take my word for it, do the research and you'll see ART and CLIPS have been used in trading systems since the early 90's.

    RETE can be thought of in 2 parts. The first is compiling declarative logic into a discrimination network which optimizes the join. The second is the runtime, which includes memories and agenda. One can apply the compilation and adapt the runtime to fit the needs of the use case. I know that both GTE and ATT have used RETE engine do handle message routing since the early 90's. In fact, some researchers at GTE have published papers on knowledgebase validation related to RETE. I believe the paper is available on ACMQueue.

    I'm bias for RETE, but I think too many people criticize RETE without taking proper time to understand it. I would argue very few people have dedicated time to really understand how powerful the algorithm is.

    ReplyDelete
  12. Hello All,

    I will start following this blog since I have an avid interest in using using rules engines. I recently started reading up on CEP and I see where both dynamic (contemporary) and historical data fit as research avenues. When I think about combining CEP with a rules engine, I tend to look at it from the streaming perspective. David Luckham had some diagrams (in an article) that suggested that there is a (kind of/sort of) boundary between the CEP and rules.

    Anyhoo, this should be interesting to follow. Keep up the good work !!

    Rich Halsey
    Pensacola, FL
    USA

    ReplyDelete
  13. My comment about David Luckham's view is drawn from: http://www.complexevents.com/media/articles/cep-article-seven.pdf

    Hopefully, I have not mis-interpreted what he wrote.

    Rich Halsey
    Pensacola, FL
    USA

    ReplyDelete
  14. bernhardttom@yahoo.comThursday, May 17, 2007

    The papers and projects that influenced how Esper was build are on our website at http://esper.codehaus.org/tutorials/links/links.html

    Tom, founder Esper

    ReplyDelete
  15. I took a look at esper source code. Here's some random thoughts. From what I can tell of the pattern node classes, the design looks rather inefficient. For the kinds of systems I've worked on with hundreds of thousands of objects, the design of esper doesn't produce an efficient pattern matching network. I'm curious, has anyone posted performance numbers for esper with 500K objects and a constant stream of message?

    I've been studying pattern matching algorithm since 2000, and the design of esper's pattern matching will result in far more evaluations than necessary. especially with complex rules that have dozens of joins, aggregations and temporal objects.

    ReplyDelete
  16. Charles Forgy has his system handling a million facts a second, with data being helf around for windows of up to 5 seconds.

    ReplyDelete
  17. I've done real world tests with JESS and 1million facts, so I'm pretty sure RETE is more than capable of handling millions of facts :)

    I really doubt esper or any pattern matching algorithm like esper is going to ever scale properly for huge trading systems. but I'm totally bias.

    ReplyDelete
  18. Very interesting discussion.

    I was also looking for some information on ESP/CEP for an educational project. I looked at NxBRE and Drools, but I faced some issues:
    - how do I handle time windows (well at least with NxBRE I found a way, but the queries were quite dirty),
    - how do I update my facts (some new facts can be infered from a fact A through some rules, but if then A is modified, the new facts that were infered may need to be retracted)
    - also there's the problem of infinite recursing if you're not careful when writing your rules, or if several persons write some rules that interact with each others (Drools has a noloop attribute for this, but maybe it's not that easy...)

    Hopefully I will be able to answer it in a couple months time. So do I!

    ReplyDelete
  19. Edson, Thanks for your post and the links to the related blogs. We just evaluating different platforms and strategies to build a Event Driven plattform to handle many incoming request send by many ditributed devices and process them based on defined rules. Because you spend some time researching this area, what do you think about using JAIN/SLEE i.e. http://www.mobicents.org as the underlaing platform ?

    ReplyDelete
  20. Hi Edson,

    As a few folks have mentioned here, there has been a recent trend to "over hype" rules for CEP or ESP applications.

    Kindly refer to this recent post:

    Bending CEP for Rules -

    http://thecepblog.com/2007/07/27/blending-cep-for-rules-and-brms-folks/

    Yours faithfully, Tim

    ReplyDelete
  21. All of the requirements you have identified in the body of your blog have been available for many years in a product called G2, which includes a rules engine specialized for event-driven, real-time applications.

    ReplyDelete