Sunday, June 24, 2007

What is a "sequential" Rule Engine

It's very hard to find a public definition of what a "sequential" rule engine is, someone needs to update Wikipedia. My knowledge in this area is also limited so I've decided to write up what I know, in the hope that others can correct and add to this, so we can get a full public definition. Maybe the results can go up onto Wikipedia :)

Sequential engines are used in stateless execution environments; i.e. you cannot assert more data and call fireAllRules rules a second time. Modifying asserted data does not result in rule re-evaluation.

A Method is generated per rule in java code. The engine takes the asserted data and creates a list of the possible cross product combinations for the provided data and rules.

It then iterates over the rules, calling each in turn with the possible cross product matches from the list. The order is determined by salience and/or the order of rules in the file. When all 'if' statements are are true it fires straight away, there is no agenda.

If asserted data is not modified it can cache tests, which improves performance. If data is changed, you cannot cache, and further rule evaluations may not fire for the modified values.

It is the conflict set combined with data modifications that seems unclear to me, you modify data but the already checked/fired rules will never be re-evaluated for this changed data. It seems ideally sequential modes should not modify asserted data and benefit from tooling to aid in the authoring of rules so they do not result in conflicts for given sets of data, i.e. only one rule will fire.

I also don't see how sequential will be faster if caching has to be turned off due to data modifications and you have a large number of rules.

50 comments:

  1. There's a wide variety of sequential rule engines and sub classes. For example, here is a list off the top of my head.
    1. simple sequential rule engine, which does not support inferencing

    2. simple sequential rule engine which supports inferencing with new facts

    3. simple sequential rule engine which supports new and modified facts

    There are some common attributes of sequential rule engines. The first is they iterate over the rules one at a time. The sequence of the rules are typically manually set. Sequential engines normally do not do conflict resolution. Some sequential engines use caching to reduce the cost of calculating cross product. The most common for of sequential engine caches nothing and simply evaluates all the data against each rule.

    There are sequential rule engines that are stateful. For example, some BPML engines support a simple form of state for long running processes. In the service coreography space, there are engines that support simple states. Usually, they store the data in a database or in a in-memory cache.

    The sequential rule engine space is sort of a free for all. Most of them are not efficient and don't scale as data or rule count increases. Though, if you ask the commercial BPM vendors out there, they will claim they scale.

    ReplyDelete
  2. In contrast to simple sequential rule engines that I've seen over the years, there's also things like sequential mode provided by iLog and FairIsaac.

    As far as I understand, the sequential mode offered by RETE rule engine vendors is much more sophisticated than simple sequential engines for workflow and BPM.

    Here is my understanding of sequential mode in JRules and Blaze. The first step is the rule compiler analyzes the rules and sorts them in the optimal order. This can be done through generating a RETE network and analyzing it. Another way is to use linear programming techniques. The most notable product from that space is CPLEX http://www.aimms.com/aimms/product/solvers/cplex.html. iLog currently owns CPLEX.

    After the compiler determines the optimal order, the rules are usually compiled to the execution format. At runtime, sequential mode starts with the first rule. If there's inferencing, the engine determines which rules need to be re-evaluated. AFAIK, JRules and Blaze sequential mode do not use an agenda.

    ReplyDelete
  3. in case anyone wants to learn more bout CPLEX, it is an implementation of Simplex (http://en.wikipedia.org/wiki/Simplex_algorithm).

    Here's another link for those into this kind of thing. http://www.cise.ufl.edu/~davis/Morgan/
    http://www.ampl.com/solvers.html

    ReplyDelete
  4. Mark - OMG PRR defines a semantics for BREs that define a "sequential mode". Typically this is used for exclusive rulesets (whereby only 1 rule is fired per ruleset) and therefore rule chainging is neither a requirement nor, in its absence, a problem. The main semantic extension over simple script-type rule execution (ie 3GLs) is the treatement of ruleVariables (ie Rete variables or patterns), which expand out the rule list for each member (runtime binding). Clearly there is scope for a rule to change the membership of some list and hence should remove some rule from the agenda. Again, such usages are really not compatible with the "simple" semantics intended with sequential execution.

    Note that BREs have found that (a) many rulesets defined by end users via a structured BRMS (eg orchestrated by a ruleflow) have exclusive rule semantics and (b) runtime execution can be very much faster than the cost of applying Rete.

    See the forthcoming PRR submission for more info (abstract up on James Taylor's EDM blog)...

    Cheers

    ReplyDelete
  5. Paul summarized nicely and I posted a rebuttal of some of the most egregious misinformation on rete v sequential execution here.
    I think we should all focus on the value of being able to declare and manage business logic effectively using rules and just recognize that different rules and rulesets can be most effectively executed using different semantics.
    JT

    ReplyDelete
  6. Mark, have you read Charles Young's comparison of Windows Workflow Foundation and the Microsoft Business Rule Engine?

    I have *no* idea if what the WF team has implemented is a typical sequential engine or whether it's good, great, bad or horrible in comparison to other engines but the article is well worth reading anyway.

    ReplyDelete
  7. I've tried to add another summary here, with the unanswered questions at the end:

    Sequential engines should ideally only result in a single rule fire. If it does result in multiple rule fires it should not modify any classes and fields used in rule evaluations. If a sequential engine does modify classes and fields that are used in rule evaluations, what are the different types of behaviour for this? Do any sequential engines allow assertions?

    ReplyDelete
  8. Like many others, I must admit to using the term 'sequential' very loosely and imprecisely to describe just about any rule engine that is not a set-based production system (this latter category being almost synonymous to 'Rete engine').

    One common differentiator between different categories of rule engine must surely be in terms of declarative set-based processing vs. imperative processing. The sequential engines I have used have all been in the latter category. They are designed to conditionally and sequentially execute commands that operate directly on, and potentially change, discrete values within the programme state. Because that state is globally available to all rules, the order in which rules are evaluated and fired becomes very important. If one rule changes the state, any rule which has already been evaluated and which is affected by the change must be re-evaluated (forward chaining).

    In set-based production systems, as we all know, things are rather different. Rule conditions are used to select subsets/products of the available data, and therefore don't refer directly to discrete state values/objects (they use variable substitution instead). Condition evaluation order is not important. Rule execution order remains important, but is governed declaratively through conflict resolution strategies and salience, rather than imperatively. Forward chaining occurs in a very different context, and is all about re-computing subsets and products when data is changed.

    Reading Mark's post re. 'stateless' execution environments, and Paul's comments about rule semantic exclusivity, I'm forming the impression that there are actually many more differentiators. The simplistic category of 'sequential' engine is looking more and more fragmented by the minute! James is right, though. The most important differentiator is how effective a given product is in enabling an organisation to meet its business goals and needs. Personally, I thing Rete engines still have the edge, at least in principle, in terms of covering a greater number of bases. As always, though, we need ever-better tooling.

    ReplyDelete
  9. quote from woolfel's blog:

    Both iLog and FairIssac have found that in many cases users don't need to add rules dynamically at runtime. In those situations, it's actually better to run the rules sequentially. There's several important factors though.

    1. the rules have to be ordered in the optimal sequence
    2. the engine must stop once a rule fires
    3. the rules shouldn't have more than 2-3 joins
    4. the rules shouldn't query for external data
    5. the rules should reason over 1-3 objects
    6. the ruleset should be less than 100 and 10% of the rules fire 90% of the time

    The most critical factor for sequential execution is rule sequence and rule count. If the ruleset is over 200, and 50% of the rules fire on average, RETE will be more efficient. This is because RETE will only fire the 1 rule that matches. If the rules are not in optimal order and the last rule fires 50% of the time, sequential approach will be much slower than RETE.

    ReplyDelete
  10. email from Stanislav Shor:

    The applicability of the sequential algorithms is largely based on the notion of rules dependencies. The (non-exhaustive) examples of the rule dependencies are here:

    The rule x modifies attribute a of the object A – the rule y has condition that depend on the value of the attribute a of the object A

    The rule x asserts the object A – the rule y either has an attribute condition based on Object A or checks existence (non existence) of the object A

    The rule x retracts the object A – the rule y either has an attribute condition based on Object A or checks existence (non existence) of the object A

    Both the rule x and rule y assert an object A – the order of the execution may be important

    The situation with automatic determination of rule dependencies gets complicated if there is a possibility of the working memory manipulations from the Java code.

    Once the dependencies between rules are have been determined (automatically or manually) and there is no loops, it is possible to split them into ordered subsets of the mutually independent rules. Then each subset can be executed sequentially with no regard to the order of the rules within each subset. It is worth mentioning that each subset execution can be split into parallel threads without affecting the process correctness.

    Each case of cyclic dependencies should be considered separately as it is often either an error or an attempt to introduce programming control structures into production system. The former needs to be fixed and the latter is better handled by meta-rule constructs such as workflow

    One can argue that most of the business process rules can be easily designed to be sequential, if this is the case the performance advantage of the sequential algorithm can be significant, because it does not bear the weight of the agenda handling and offer a possibility of the parallel execution. The same goes for the clarity and transparency of the rule system even though James Taylor thinks otherwise. The argument here goes that explicit dependencies are better than implicit ones and that explicit ruleset decomposition on sequential subsets provides better structure, because different subsets are usually applied at the different stages of the process.

    ReplyDelete
  11. The real catch I see with rule dependency analysis is the user has to write simple rules. If the rules reason over meta data, use complex calculations, or involves java method calls it's difficult to calculate the dependency. for example, say the LHS of the rule uses 2 functions, which calculate a numeric value from the objects in the LHS for a (test () ) pattern.

    For an analysis tool to accurately calculate the rule dependencies, the functions have to tell the system what object attributes it uses. In the case of rules that call a java method, the engine would need to know what that method does. If the analysis tool ignored the functions, it "could" produce a bad report.

    In the past, I've applied similar analysis techniques to partition rules into stages for real-time pre-trade compliance. I think there's still a lot more the industry can get from rule analysis, but to state that 80-90% of the cases are more suitable for sequential needs to be backed by real data. Even better, that data should be made public for everyone to scrutinize.

    Even if a large ruleset fits into the sequential case, if each subset ranges in the thousands, using RETE will be much faster than a "simple" sequential engine.

    ReplyDelete
  12. Peter,

    there is no such data and it's hard to imagine that it can ever be. There is some empirical evidence such as long-term existence in the marketplace of the rule companies implementing different kinds of the non-rete algorithms, introduction of the sequential mode by the major players etc. Also, the argument goes, business processes such as insurance application or mortgage application processing are streamlined to be easily understood and implemented by an average(actually, worse than average) mortgage broker, not an AI specialist.
    I would also add that unfortunately your statements about Rete being much faster for large rulesets are not backed by any statistical data either, and you should know that even two rete engines can demonstrate completely different performance for different kinds of problems. I would also add that sequential algorithms have much better performance predictability than Rete-based.
    For example, I don't see how rete can perform better than optimized decision or lookup table. Also, in SOA environment the need for rete engine to "clean up" after each request processing to be ready for the next request introduces unnecessary overhead etc. So, my conclusion is, Rete can be good for some kinds of problems and bad for the others, it is just a tool, an algorithm, a method, one of the many, I admit it is very well-thought and sophisticated. Unfortunately, being a "general-purpose" algorithm has disadvantages too ...

    ReplyDelete
  13. Over the last 5 years, I've run real world tests with real compliance rules that show beyond a shadow of a doubt RETE blows sequential out of the water. I also have a set of benchmarks with some very simple rules demonstrating how poorly sequential perform as ruleset double in size. I'm more than happy to provide a sample ruleset, which represents real compliance rules.

    Having built a real-time pre-trade compliance application using JESS and done extensive profiling and analysis using optimizeIt, and rational quantify, I can say with confidence for financial compliance applications sequential rule engines are completely inappropriate.

    On the other hand, if we take a typical JMS message filtering scenario with 500 rules, RETE will be faster than sequential. In fact, there's an old entry on jbossrules blog about content based filtering from last year.

    I've built sequential engines in the past and I agree they can be faster than RETE, if the conditions I stated are met. Mark was kind enough to paste that criteria in one of the comments. I've written a couple of different rule compilers and worked on a few rule engines.

    From my experience at the lowest levels working on rule engines and doing rule consulting, the biggest challenge isn't the rule engine. The biggest challenge is teaching users to write good rules. By good I don't mean simple. I've come across many simple cases where a rule only uses 1 object and has less than 5 conditions. I've also come across complex business process which have to get data from some external source and perform inferencing.

    Saying sequential is better for 80-90% of the cases doesn't match my experience. In my case though, I've worked on pre-trade compliance, mobile platforms (ie for smart phones and mobile devices), implicit personalization, messaging routing and some health care.

    I think the industry as a whole needs to work together to create a set of realistic use cases and have each company provide a reference implementation. This way, customers can compare and contrast each solution.

    ReplyDelete
  14. Here is an impartial (because they compare the different modes of their own product) report from Blaze.

    google rules :)

    Or may be they just were trying to sell the new feature?
    As always, it does not prove anything but helps to illustrate the possibility of different points of view.

    And if I remember correctly the filtering example Peter refers to in his post, the performance numbers struck me as completely bogus. Hopefully some experts in this forum can explain it differently, I personally believe the reason for the numbers was a flawed (how?) methodology

    ReplyDelete
  15. Does anyone have any good sequential benchmarks they could make public?

    ReplyDelete
  16. Interesting discussion... if I may add a few comments...

    1. Note the Blaze report referenced refers to the earlier Blaze Rete implementation, not their more recent Rete varients created after Charles Forgy joined their team. It would be interesting to see an update to this benchmark (over to you, James). Note the use of the boxcar concept for maximizing throughput to a rule engine.

    2. Comments like "RETE blows sequential out of the water" clearly only apply in certain cases. Be sure: if the mode was not useful (ie higher performance) to customers for certain rulesets, the commercial rule vendors would not have implemented it!

    3. Also check out IBM "RuleBeans" as a sequential rule implementation example ...

    4. My personal opinion is that the terms "rule engine" and "sequential mode" are not really compatible. If the rule ordering is not being managed by the engine then its not really an engine. I guess the "engine" could be doing data orchestration / manipulation or script processing, but in my mind the term "rule engine" means "declarative rules".

    5. The original paper that James blogged about can be googled at http://www.eurobizrules.org/Uploads/Files/Tortolero_2c_20A_2.pdf

    Endnote: I'm surprised that the JBoss rules team have not considered this before. Commercial vendors like ILOG and Fair Isaac probably use private sequential-optimized benchmarks that spank *any* Rete-only rule engine performance, which they could trot out at will to their prospects. So perhaps this is a good example of the differences re commercial vs open source models: vendor sales process (a "push" model) includes customer education on matters like rule modes, that simply does not apply in the open source model (a "pull" model). Or is this too simplistic?

    ReplyDelete
  17. That filtering example was from a jbossrules user, so I don't have the code. The results reported by the user are valid for that kind of use case (ie comparing an XSLT vs Jbossrules implementation). Using XPath statements as a rule engine can be considered a type of sequential rule engine. I fail to see how the results are "bogus". It would be "bogus" to say all message filtering use cases are like that paper or that all content based routing fits that model. It would also be incorrect to equate a simple XPath approach to other sequential engines.

    I can easily write a custom rule compiler to generate an optimal execution plan for a narrow use case, but the catch is this. If the scope expands and the types of rules change, that custom code will likely run slow. I said this in the past, but I'll say it again.

    for anyone with a solid understanding of pattern matching, generating an optimized discrimination network for a narrow use case is straight forward. The trade-off is that custom network won't perform well for the general case. RETE is the best general case algorithm from real world testing over the last 20 years. there's literally dozens of papers on ACMQueue that cover the performance in great detail. If you count all the papers on RETE, it's probably over 100. LEAPS is the best lazy evaluation algorithm, but the trade off is it's not appropriate for situations where the application has to calculate the full cross product.

    Even if an use case fits nicely into sequential mode, one can still apply RETE compilation techniques to improve performance. I've done that in the past. As far as I'm concerned, all modern rule engines can benefit from RETE algorithm. The lessons hidden in the algorithm show how to build efficient pattern matching networks. I'm clearly bias, but RETE performance has nothing to do with my bias. The algorithm stands on by itself and has proven over 20 years how well it works.

    ReplyDelete
  18. Peter,
    So if the numbers are not bogus, should I assume that switching from Drools 2.5 to jboss rules 3 will improve performance 100-150 times and increasing the number of rules from 10 to 1000 will decrease performance just 4 times? Wow

    Paul,
    regarding #4, the sequential mode operates on the same ruleset as rete does, therefore the rules remain as "declarative" as they were. The rule ordering is not managed by engine, it is always managed by developers (like in "guns don't kill people..."). In case of rete (or inference engine in general) it is done by salience, modify() or asserting some objects that (unfortunately implicitly) are required for some other rules to fire. The difference is that sequential ordering is explicit and inference implicit and therefore the errors are harder to catch. Another issue with inference engine is that you want business users to create/modify rules; where to put salience ,assert or modify in DSL?

    I don't want to sound as Rete(inference) opponent, it has it's place even in a business apps world. It's significance will increase with an adoption of SemWeb technologies. But to say that this is the best tool(metaphor) for all kinds of BR problems is IMO an overstatement. Even the "simple" problems like sorting or string matching has a good dozen of algorithms, none pretending to be "the best" for all use cases.

    ReplyDelete
  19. snshor,

    if you were to spend 6 months reading drools 2.x and compare it to jbrules 4 code, you'll understand why. Here is a summary for you. Drools 2.5 was a forward chaining implementation. It wasn't RETE, which mean under many situations it scaled poorly as the dataset increased or if the rule count increased. Drools3.0 was much faster than 2.5 because it was RETE. JBrules 4.0 should be 2-4x faster than 3.0.x, so for most cases users will see 2-4x improvement in performance. Obviously, mileage will vary depending on the type of rules the user has.

    The reason performance improved is because mark, edson, michael and bob worked thier asses off. It's not magic or mystery. All you have to do is take the time to read the code and understand exactly how it works to realize it really does work. The world of rule engines is vast and understanding each specialty takes years.

    What bothers me is that people are making false and inaccurate statements about RETE without having spent years studying and analyzing the algorithm. I've dedicated the last 6 years to the study and analysis of pattern matching algorithms. Primarily RETE and LEAPS, since those two are the dominant algorithms. by study, I mean read everything about RETE I can get my hands on and implement my own RETE rule engine to see first hand how it works in practice.

    ReplyDelete
  20. its more than 2 to 4 times faster, maybe not for the simplest case. But for large networks that exploit indexing with large cross products its 10s of magnitudes faster.

    ReplyDelete
  21. snshor,

    since there is already data for XPath versus drools 2.5 and jbrules 3, my challenge to you is this.

    Take that same example and implement it how ever you want using the same rules. Run the benchmark and make the implementation and results available for the world to see. Wouldn't that be the best way to show the power of the type of sequential engine you're proposing? I'm sure Michael Frandsen would be interested in seeing the results.

    ReplyDelete
  22. AFAIK, the latest performance results for OPSJ, which is in the newest Blaze provides comparable performance to blaze sequential mode for many cases. There's definitely cases where sequential will be faster in Blaze sequential mode than RETE mode.

    Having said that, it's straight forward to generate an optimize execution plan for a static ruleset. One can use simplex algorithm or other static analysis algorithms to figure out the optimal execution plan. If the business doesn't need to add rules dynamically at runtime, then it makes perfect sense to use sequential mode. If performance between running in RETE or sequential mode is equal, choosing one over the other is really a matter of choice.

    Depending on how the rules are written and the data, calculating the optimal execution plan may not be practical. for example, if someone uses data driven rule techniques, it's not practical to calculate the optimal execution plan. in contrast, if business users write the rules with hard-coded constants, it's much easier to calculate the optimal execution plan.

    ReplyDelete
  23. Peter,

    How about instead of spending 6 years reading code I just take a wild guess and speculate that all of the 1000 rules in question were the copy of the same rule (may be 2 different rules)? In this case the condition sharing will definitely do "the magic", but I would not use the results to argue about the RETE superiority.

    Ok, this explains the "scalability" of the Drools and JBrules engines, but still the question remains why Jboss Rules is 130 times faster than Drools 2.5 with 10 rules? After another careful reading of Michael's report I found the reason:
    ==================
    • Drools 2.5
    • Hybrid solution
    • Building custom semantic module (SMF)
    • Rules in XML
    • Defined XML-Schema for rule language format
    • Implementing smf interfaces
    All constraints are translated to XPath Expressions
    • Main input for the rule engine: org.w3c.dom.Document
    ==================================

    So, Drools 2.5 variant generates Xpath and that's why it's slower than Xpath itself (on small number of rules, for larger number the "magic" condition sharing kicks in).
    And JBrules variant works with Java model directly. Kinda like apples and clockwork oranges.

    All that's left now is to contact Michael and ask him to confirm or reject my statements ... Would you do it?

    ReplyDelete
  24. I haven't seen the rules, but my assumption is that it's not 1000 rules with the same exact conditions. I've done test with 1000, 2000, 5000, 8000, 10000 and 12000 simple rules with 2 conditions using 1 object type. each rule has 2 constants and no constants are repeated. basically, I generated the rules by incrementing the condition from 1-x where x is the number of rules. that's the most basic test, which provides a reference point for brute force sequential. I would not expect a commercial rule engine to use brute force.

    I don't have michael's email, but mark should be able to get in touch with him. In case there's any confusion, I'm not a member of jbossrules, so I don't speak for them.

    if you want papers on RETE performance, I have a collection of them. I will try to come up with a sample ruleset for a simple message routing scenario and post it to jamocha's SVN. I already have some benchmarks up on jamocha's SVN for large rulesets, but they are micro benchmarks.

    ReplyDelete
  25. snshor,

    Here is a link to my micro-benchmarks. they are very simple. A real rule would not be this simple, so it's not representative of a rule use case.

    http://svn.sourceforge.net/viewvc/jamocha/samples/ruleset_size/

    as a micro-benchmark, it is useful to measure pattern matching performance regardless of the algorithm a rule engine implements.

    peter

    ReplyDelete
  26. Hi snshor,
    from the point of view of Drools 2.5 it was not really fair, I must admit.

    JBoss Rules 3.x has a better rete implementation and the approach to use the SOAP-elements (Java representation) directly was smarter because of less transformation and much faster node evaluation inside of the rete network.

    The rules were all different, generated into a sql database and transformed into the different formats (Drools 2.5 SMF, custom format for generating xpath and direct transformation into JBoss Rules DRL/DSL).

    I have an actual task on my agenda to compare the hybrid solution, like it is done in JBoss ESB CBR, with my approach working on XML-elements instead of SOAP-Elements. It is almost done, only need two create a little execution scenario.

    So, give me some more time and I will ask Mark to publish it at the wiki or blog.

    ReplyDelete
  27. This comment has been removed by the author.

    ReplyDelete
  28. This comment has been removed by the author.

    ReplyDelete
  29. pajac,

    welcome to the discussion, thanks for not making us wait long. Unfortunately the topic somewhat shifted from the initial topic of sequential mode of execution, but i believe it is still important to give users the realistic expectations of engine performance.
    I have just one question left regarding how the rules were different. Before i jump to any hasty conclusions again, could you please post a few rule samples and briefly describe what was the rule variation? How many different templates were used for the generation and how did they look?

    ReplyDelete
  30. snshor commented "rule ordering is not managed by engine, it is always managed by developers". This is a good point:
    - a BRE is only a component of a BRMS (and as we have seen some "BRMS" vendors do without it entirely!)
    - rule-based programming is a skill, but most BRMS solutions are sold for the model-driven / CASE capabilities where the rule management for the business user is key

    Again - thanks for the interesting discussion.

    ReplyDelete
  31. A short description how the rules were done:

    All rules consist of 3 conditions. Every rule has the same order of conditions. The first condition had 20 possible, the second 40 and the last one 100 different possible values. So the most filtering condition was at the first column of a rule and so on.

    To create them, I´ve created a database table with the 5 columns (ID, condition 1,2,3 and a routing goal). A program created thousands of records. A SQL-Select-Distinct-Statement resulted in more than 1000 different rows (going to be the rules). The rows were taken into the memory and transformed into Drools 2.5 SMF, JBoss Rules 3 rule language and a custom file format for XPath.

    Drools 2.5 example rule:

    < rule name="RoutingRule2" salience="1">
    <fin:condition><fin:Condition1>A_5</fin:Condition1></fin:condition>
    <fin:condition><fin:Condition2>B_40</fin:Condition2></fin:condition>
    <fin:condition><fin:Condition3>C_1</fin:Condition3></fin:condition>
    <kik:consequence><kik:addRecipient>goal_2</kik:addRecipient></kik:consequence>
    </rule>

    JBoss Rules:

    rule "Routing rule 2"
    when
    SOAPElement ( localName == "Condition1", value == "A_5")
    SOAPElement ( localName == "Condition2", value == "B_40")
    SOAPElement ( localName == "Condition3", value == "C_1")
    then
    receiverCollector.addServiceName("goal_2");
    end

    XPath:

    XPATH_RULE_CONDITION_MAX_2=3
    XPATH_RULE_CONDITION_2_1=/descendant-or-self::node()[Condition1="A_5"]
    XPATH_RULE_CONDITION_2_2=/descendant-or-self::node()[Condition2="B_40"]
    XPATH_RULE_CONDITION_2_3=/descendant-or-self::node()[Condition3="C_1"]
    XPATH_RULE_CONSEQUENCE_2=goal_2

    I exchanged some project specific terms with "Condition1,2,3" and "A,B,C". Hope that helps understanding. More executable code soon.

    ReplyDelete
  32. pajac(Michael),

    please don't take my analysis personally

    Your test
    checks just 1 Object type with 2 attributes with only 1 (equality) operator applied to the String attribute

    It does not test

    1) various complex conditions and joins that can be the case in the real world, including collections, OR operators, arithmetic comparisons etc;

    2)how the agenda performs, if you run 1 test record(a few SoapElements) at the time
    3)some other stuff, that I just can not tell from the top of my head


    What does it really test? I'd say, how well the method String.hashCode() performs - guys, feel free to correct me if I am wrong.

    With a little effort, I believe, you can fix your testing methodology by employing more sophisticated rule generation techniques to address the issues mentioned above. Jboss rules team will sure be able to add to the list and provide the hints how better test different aspects of the engine to bring test closer to the real-life conditions. Eventually, if done right, this project can become a foundation for creating a family of benchmarks to test the performance of different rule engines for different real-world scenarios - something this industry have been missing for a long time. Until then we will continue to hear here and there the claims about 100s times performance advantages over competition that have nothing to do with the reality ...

    ReplyDelete
  33. I don´t see the need to show the most complex conditions in context-based routing. The rules need to be easy understandable. Fast execution is necessary to get a certain scalability.

    All "competitors" had the same rule base. You could say a fair chance. Background was to proof that a CBR can be build using a rule engine. It does.

    If you work with an Enterprise Service Bus (routing web service messages), your input will be a XML or SOAP document. Take a look at the SOAP protocol. There are much more attributes which you can directly use in your rules. You can hide complexity in a domain specific language (DSL). You don´t need to learn the XPath language...

    Of course it is part of the approach to use fast String comparisions instead of great evaluations.

    ReplyDelete
  34. snshor,

    your statement confuses me a bit. I agree the example rules in michael's test doesn't represent the typical message routing scenario. I think everyone can agree there are more sophisticated and complex rules used in the message routing world.

    To me, the performance results for such a simple test establishes a baseline. In other words, with more complex rules the performance will only decrease compared to the baseline. If I remember the scientific method correctly, it is usually good to have a baseline. Once you've established a solid baseline, you can move on to more complex cases.

    I think michael has provided sufficient detail for you to replicate the benchmark and provide baseline results. It doesn't really matter what kind of rule engine you use, since it's still going to be useful data. I've done similar tests using brute force sequential evaluation. My own experience with micro-benchmarks and real rules shows RETE is faster for most cases.

    for the record, using rules like the ones in michael's example or my micro-benchmark, the performance with respect to rule count should be close to constant. I've done extensive testing with jamocha to verify the results. Can a sequential engine get "near constant" performance as rule count increases?

    ReplyDelete
  35. Michael,

    Unfortunately your current approach (not intentionally) uses just one particular performance optimization feature of the JBoss Rule Engine. Repeated 1000 times it creates a skewed performance picture. Therefore the test results can not be used as the reliable benchmark for neither rule engine performance in general nor for even more narrow CBR scenarios.

    The goal of my suggestions was to change the testing methodology you were using to be able to use it with higher level of reliability for different use case scenarios, including but not limited to CBR scenarios.

    The lager point I was trying to make in my previous post was that your work can become a perfect foundation point for some good benchmarking platform. It already has mechanisms for calling different engines, test rules generation etc. All you need is to add some "intelligence" to the rule generation (btw the rule engine can be used for that), and parameterize some procedures ... Sure, it all makes sense only if you are interested to continue in this direction

    ReplyDelete
  36. snshor,

    I've been slowly working on writing a set of benchmarks. unfortunately, it's not done and lately I haven't had time to work on it. Here is a link to it.

    http://woolfel.blogspot.com/2006/08/benchmark-suite-take-4.html

    I disagree with you. A baseline is important to get a well rounded picture of how different rule engines perform. For me, getting a clear picture of how a rule engine performs under a variety of use cases is what's needed to help developer make decisions.

    if you want an invite to my blog, email me at woolfel AT gmail DOT com.

    ReplyDelete
  37. snshor,

    Are you the author of OpenL Tablets ? (http://openl-tablets.sourceforge.net/apologia.shtml)

    I took a quick look at the source code and had some questions. In OpenL tablets, does the engine stop evaluating rules once 1 rule has executed? If the engine doesn't stop once a rule is executed (aka fired in RETE engines), how do you make sure the engine doesn't do more work than necessary?

    Looking at the decision table class http://openl-tablets.cvs.sourceforge.net/openl-tablets/DEV/org.openl.rules/src/org/openl/rules/dt/DecisionTable.java?view=markup
    I see it uses 1 row per rule. From the query class (http://openl-tablets.cvs.sourceforge.net/openl-tablets/DEV/org.openl.rules/src/org/openl/rules/dt/DTRuleQuery.java?view=markup),
    it looks like you filter the rules first. Is that a correct interpretation?

    ReplyDelete
  38. Peter,

    Answering your previous question,
    the rules structured the way they are in this test perfectly fit a simple decision table. The optimized decision table will provide the constant speed and much better readability for this type of the problem.

    And yes, I am the author and if you have OpenL Tablets related questions let’s better discuss them at the project forums where they belong.
    ++++++++++++++++++++++++++++++++++++++++++++++

    Benchmarks:

    Our main disagreement is regarding the methodology. IMO, the tests that you refer to as baseline belong to the suit of your test cases, to make sure that your engine performs as expected under "laboratory" conditions. When published as "benchmarks" they do more harm than good, because most of the users do not have time to read through the fine print.
    It can be especially harmful when such "benchmarks" become a "manager's guide" for selecting a vendor.


    The field is so confusing; consider carefully the statement that you cited that Rete III "provides comparable performance to blaze sequential mode". Can I use this as an argument that the sequential mode is many times faster than any regular Rete? I guess not, even though I could ;)

    The lack of the real-world performance benchmarks that can more or less realistically guide the user base I largely attribute to the fact of incompatibility of the rule formats used by different vendors. The standardization efforts may improve the state of affairs but will always lag the new development. In this case the testing/benchmarking framework based on the targeted rule generation from the templates, like the one showcased by Michael, could prove itself useful.

    In any case, I would never recommend to anybody to select the BRMS based on performance numbers alone.

    What are the good criteria? To find an answer for this we should look at the areas where benchmarks are mature and are the subject of the public scrutiny. For example PCs or Autos. In both cases we can talk about performance, but this is not the only factor, even if you buy a "performance" car. The products are usually split into some usage categories like Family Sedan or Gaming PC, different criteria are applied in different categories. To extend the analogy to the BR area we need to define the categories first, then agree on what features are important for each category. For example, AI Research may require fast inference, back-chaining and efficient constraint satisfaction engines with little Web UI administration support, and High Transaction Volume Financial Web Service may require the super-efficient sequential mode.

    It is hard to imagine that the same BRMS that wins in the category of the Very Expensive Enterprise Business Intelligence Bus will even compete in the Really Cheap and Simple For My Junior Year CS Class category. Software is less subjected to the laws of gravity, so let’s hope that at some point we will have the product with Ferrari acceleration and chick magnet power, hauling capacity of RAM 350, comfort of Bentley at the price of Hyundai.

    ReplyDelete
  39. snshor,

    I'm not talking about using baseline as marketing material. I don't care about marketing. I care about have a wide variety of benchmarks including micro-benchmarks to measure the efficiency of the implementation. All of it should come together to provide detailed picture of the performance characteristics under different scenarios. Since you believe OpenL tablets will perform well for the simple case, then why not run the test.

    I always warn people that the biggest factor in choosing a technical solution is based on their needs. I've recommended people by-pass a rule engine completely, so I don't believe any single thing meets all situations.

    I'm bias toward complex cases, like pre-trade compliance and complex business processes. From my experience where inferencing is needed, a RETE engine is a good fit. your experience may be different.

    ReplyDelete
  40. Peter,

    you know, at this day and age everything that's posted online is some sort of a marketing. And therefore all your statements about Rete performance and usability should be considered as such, because you develop and market an engine implementing Rete algorithm. On a lighter note, I consider your marketing a bit unorthodox, for example, try the Google search on "rule engine blow up". Guess what link comes first? :)

    I never said, that the current implementation of OpenL Tablets provides the fastest performance for the simple case you mentioned. The optimizations I mentioned are in my plans, but before today they were of low priority. Do you want to know why? Because for all kinds of the real-life projects that I had to deal with, the performance happened to be more than adequate.

    The optimization effects on small or medium decision tables have no practical meaning and large decision tables I consider a designer's mistake. They may happen only in artificially created tests like the one we are discussing here. They are the data, and should be treated and processed as such.

    Anyway, I accept your challenge, bookmark this post, we will discuss this in a few months, when, time permitting, I'll have the optimizations implemented. It may even be a good thing from the marketing perspective ...

    ReplyDelete
  41. guys, this is great stuff. Thank you for taking the time to have such heated debates here - it's been a great read for everyone. When you do revisit this topic please feel free to come back here, I'll make a blog title on the subject again with info from the two of you, and we can have another open debate here.

    Btw I'm more than happy for people to discuss other products in the comments area, this site is for general AI information, not just JBoss Rules.

    ReplyDelete
  42. snshor,

    for the record, I do not work for any rule vendor. No rule vendor pays me. I do have my own RETE engine that is open source hosted on sourceforge, but I do not actively market it. I also do not tell people to use it. For me, it's a research project, which allows me to get deep into pattern matching algorithms and optimization techniques. I have suggested other rule engine developers use it for ideas and suggestions.

    Having said that, I've dedicated the last 6 years to studying pattern matching. I think many people assume RETE equates BRE, but to me it doesn't. RETE is a pattern matching algorithm. It can be applied to a number of tools that need to compile statements to optimized query plans. The second part of the algorithm, explains how to handle incremental changes to data.

    I do have an agenda. I want to advance the state of art in pattern matching and push technology forward.

    ReplyDelete
  43. out of interest, for use cases that are best suited for sequential, what sort of performance gains are we seeing over Rete? 30%, 50%, 200%, 500%?

    ReplyDelete
  44. The Blaze paper mentioned earlier says it is 2.75 x improvement for interpreted mode and 41.6 x improvement for compiled mode. The numbers were measured in 2004 when Blaze did not have Rete III. My personal experiments with Ilog and Jess back in 2002 showed more than 10 times improvements for the interpreted mode for some cases.

    This Ilog paper shows 7x improvement for 10000 rules.

    All this evidence is anecdotal, but I believe that for the cases that fit well the sequential mode the improvement should be in hundreds percents. The actual numbers may vary widely depending on the specifics and the size of the problem and details of Rete and the sequential mode implementation

    ReplyDelete
  45. I tried to find out what kind of rules were used for those tests. I don't have JRules 5, but in JRules 6 docs, I see some simple rules with 2 conditions. If the examples in JRules Docs are representative of what is used that iLog's PDF paper on sequential, they are similar to my micro-benchmarks. Anyone know if iLog has run that same benchmark on JRules 6.x?

    A good RETE implementation that has AlphaNode hashing in the ObjectTypeNode should see constant performance as ruleset increases. My old results http://people.apache.org/~woolfel/webpages/ruleset_size_series.htm
    with a dataset of 50K objects averages out to 0.08ms per evaluation per object. According to ilog's PDF, they had 10K rule and 10K messages for 100K evaluations and an average of 8-9ms. If anyone has the ruleset iLog used for that test, I'll be happy to run the same test in jamocha to get a true apples-to-apples comparison. it would be interesting and should be enlightening.

    peter

    ReplyDelete
  46. Hi Mark,

    Can you please suggest, how to overcome the issue http://jira.jboss.com/jira/browse/JBRULES-1009

    Is there way to clear

    We also tried creating a method inside ClassFieldExtractorCache class, which clear the cache. But it failed. Surprisingly we got compilation error in the drl file. The error was "duplicate class definition: org/drools/base/com/test/QuoteType$getQuoteNumber teststgd/RuleSet asdfsdfvda.drl Unknown 1186639995468 1646"


    Can you please give us some pointer on this.

    Regards,
    Gaurav Malhotra

    ReplyDelete
  47. Please use the mailing lists for technical help. I don't know a work around for that one at this time.

    ReplyDelete
  48. There is one important fundamental point that is being overlooked here. Business rules are by definition declarative. Sequential algorithms which require rule author to sequence the rules are by definition impertive. Therefore, in my opinion, sequential engines aren't true business rules engines.

    Sequential algorithms are the Rete vendors' answer to the well documented data scalability problem of the original Rete algorithm. In fact that is why Charles Forgy invented Rete II and Rete III. These algorithms significaly lessen Rete algorithm's data scalability problem, but I do not believe they fundamental remove the problem.

    Vendors who don't have access to these proprietary algorithms had to address the data scalability problem by introducing sequential modes which basically skip Rete. So sequential algorithms are nothing but a bandaid and don't deserve to be treated as a true business rules engine IMHO. I'd feel differently if you could take an arbitrary rule set and simply switch from Rete mode to sequential mode and magically end up with better performance. But when the choice of algorithm limits the rule language expressive power then that's not a real choice.

    I agree with many others in that ideally the focus should be on modeling rules declaratively without having to worry about the engine algorithm. In fact, the business rule engines should ideally be swappable similarly to how RDBMSs are swappable. The same rule model should be deployable to potentially multiple different rule engines/execution environments. The engine algorithm should ideally have no impact on the rule author or limit them in any way.

    All this focus on the engine algorithms is missing the forrest for the trees. Unless business rule modeling tools become truly business friendly, instead of just claiming to be business friently, then the BRMS space is not going to reach its full potential. The BRMS market should be as big as the RDBMS market. Business rules should be a tier of the enterprise architecture just like data and process currently are. But with all this internal squabbling among the rule vendors and the mixed messages that confuse the market even more, we are all doing the market a disfavor.

    ReplyDelete
  49. Readers of this article should find the following interesting:

    An XPath Enabled Rule Engine
    https://www.p6r.com/articles/2008/05/22/an-xpath-enabled-rule-engine/

    https://www.p6r.com/software/xjr.html

    ReplyDelete