Tuesday, November 06, 2007

More great articles

John Dunning has produced a fantastic blog entry summarising his perforamance characteristics research at EBRD, the European Bank of Reconstruction and Development - "Benchmarking Drools" which was a follow up on his "Banking on Drools" blog. In this he takes a simple problem and solves it a number of different ways, benchmarking each, he details all the variations with several perf charts comparing the approaches. Someone else has recently taking a variation (making it harder) of his benchmarks and run it under JRules, with very surprisingly results, I'll provide more details later on when the results have been better verified :) I'll also try and get a zip where you can run this benchmark for Drools and JRules yourself, as luckily ILog have now made their software easily obtainable for trial purposes. Lets's put it this way, which system do you think scales easily to over 500K, even 1mill, objects, and which one didn't :)

Steve Shabino has started a blog on his experience with Drools, as they have an extreme problem to solve involving the need to reason over 2mill objects in a stateful session. I'm really looking forward to seeing his findings.

The mysterious Yaakov Kohen has started blogging again with an insightful blog on the problems with todays acadmic benchmarks.


  1. Let me know if you need help with the "verification" of the benchmark running on JRules! ;-)

    Daniel Selman

  2. yes, that'll be happening soon I believe. I don't want to publish anything without being sure it's true.

  3. As usual I disgree that there's any inherent problem with academic benchmark. They're only bad when sales guys abuse them. All benchmarks be it academic or not are subject to abuse, so I think it's more accurate to say "the problem with all benchmarks."

    If the user stupidly assumes benchmark X provides a complete picture, they're foolish. A benchmark is just a point of reference, nothing more, nothing less. To say academic benchmarks have problems makes as much sense as "the problem with blue sky."

    Even if someone runs a suite of 20 benchmarks, that is no gaurantee the engine will work for project X. All things considered, the speed of the engine is just one small piece of a larger puzzle.

  4. The problem with "classic" benchmarks as I see it, is that BRE is used to solve the problems, that in real world nobody would use it to solve. For example, the manners is a simple constraint satisfaction problem, that much more efficiently can (and should) be solved by a constraint engine. And the problem definition would be much more declarative - just define the constraints - and (optionally) solution strategy.

  5. I think manners isn't a simple constraint problem. In fact, i would argue there's a lot of subtlety in manners that many people do not understand. In fact, the version of manners that ships with JRules is significantly different than the original, but how many JRules users failed to notice it?

    A constraint engine is just a rule engine. There's no difference. One can use LEAPS, RETE or any other algorithm to build a constraint engine. Constraint engines are useful, but just like production systems, they don't solve every single problem. Some things are better expressed as constraints, while others are better expressed as declarative rules. Take CEP for example. Expressing temporal logic succinctly with a rule language like CLIPS is awkward. Sometimes you need a domain specific language to express the ideas clearly. How that gets executed shouldn't affect how the logic is expressed by the user.

  6. I like manners and waltz for benchmarking drools, we use it internally and its a good way to track regressions. But we are implementing and executing them as expected, so it has meaning for us. As a benchmark to compare across the current main industry engines in such a way to provide useful information to the end user, it has less value.

  7. Peter, Manners has just 2 constraints(ok, may be 3), it is really simple. The complexity of the "classic" solution arises from the fact that a production engine is not a right tool for solving this kind of problems. You refer to the complexity of the solution expressed in rules, not to the complexity of the problem.

  8. Stanislov, I would suggest you study the problem. The constraint isn't just 2 or 3. The problem tries to seat the guests based on gender, seat and hobby. The problem is combinatorial, and depending on the data, there may not be a solution, or there may be dozens of solutions. It depends on how many guests there are and how many hobbies each person has.

    In the case of the original manners benchmark, miranker used path and chosen. Whether one uses a production system or a constraint engine, you still need to track which guests have been assigned and which paths are still available. therefore the number of constraints isn't 2. It's atleast 5. If we include a fact to indicate the total number of guests, it would be 6.

    I agree with you that using a production engine probably isn't the best tool to solve the problem. My challenge to you is this. Provide a concrete implementation to solve the same problem in a more elegant way. I've thought about how the best way to explain the benchmark over the last 7 years. My experience is that expressing the problem succinctly is quite hard from all perspectives. Explaining it to someone else so they can implement it correctly is quite hard.

  9. Peter, if my arithmetic does not fail me, seat, gender and hobby amounts to three. Other "constraints" that you mentioned are implementation-specific and are not part of the problem definition. Constraint engines are specifically designed to handle efficiently this type of combinatorial problems, so that a user only provides a problem definition and (optionally) a search strategy - but for simple problems like Manners usually a default strategy will work sufficiently fast.

    By now, you should realize that I prefer not to speak about problems that I have not researched in some way. So, I'll try to post a solution on OpenL Tablets site time permitting. BTW, returning to our previous discussion - I've posted the new version with constant time performance for large decision tables - you may want to check it out

  10. The manners benchmark is a very good one to test the agenda management of rule engines. Built for this purpose, I would say it is a good benchmark.

    But I agree that the problem of guest placements itself could be better solved using constraint programming. Constraint programming is indeed designed for solving this kind of problems.

    But the manners benchmark does not have the intent to compete with the elegance of constraint programming. It is built in some way, and if every engine works on the same rules, it provides a comparative value between the engines (on the feature solicited). It remains a big question: how important is the agenda management in real applications, especially the very specific one needed for manners (the depth-first approach on the seating objects - the seating objects the most far should be fired first).

    ILOG JRules implements a little differently the agenda management (recency equals to the last fact modified, I will explain this in more details on ILOG’s forum as a reply to Peter Lin’s posts), and the original version of manners exhibits the issue with the agenda management. As such, the original version of manners cannot be run with JRules. From here, I could say that the manners benchmark is too complex, and this is a drawback to its popularity.

    As a benchmark, I would also prefer it easier to understand. The solicited functionality could also be made less subtle, or if it relies on a specific subtle feature, it should be well explained and accepted. Manners puts the technical barrier definitely too high. So isn’t better that we design simpler benchmarks (simpler, but not micro benchmarks) that can be explained in a less than 3 minutes? This way we could spend more time to analyze the results rather than to understand the problem itself.


    Changhai Ke

  11. explaining manners definitely takes more than 3 minutes, so by that metric, it definitely isn't a great benchmark for general users or consumers of rule engines.

    this is just my experience and probably doesn't represent a common case. In past projects, I've needed depth strategy because the ruleset modifies many facts to evaluate pret-trade compliance for a Order management system. In my case, we were recalculating aggregates, exposures and risks in real-time on a set of transactions, so using depth strategy is a valid business requirement. Having said that, not every project needs depth strategy. Sometimes breadth or recency is needed. A rule engine should give the developer the ability to change the conflict resolution to meet their needs. Most engines give the user the option of choosing from several different conflict resolution strategies.

  12. Stanislov, what do you consider constraint programming. According to wikipedia (http://en.wikipedia.org/wiki/Constraint_programming) prolog was influenced by constraint logic programming. Since ART, CLIPS and most modern rule languages are heavily influenced by prolog and logic programming, I'm curious to see what you define as constraint programming.

  13. Peter, I believe that I can not produce better definition of CP than the Wiki article you mentioned, just read it carefully, it also contains the list of products, some commercial and some open-source. There is no rule engines in the list and for the good reason: they are not constraint engines(aka solvers).

    While it is hard to explain the difference in few words, I'll try: rule engines (mostly) produce new facts - that's why they are sometimes called production engines, constraint solvers find a solution by reducing search space - great for combinatorial problems.

    Constraint solvers(CS) achieve their performance due to the sophisticated mechanisms of constraint propagation(domain reduction), especially for finite discrete domains. That's why for simple problems it is often enough just define underlying constraints - making CS very declarative.

    In real life you will probably need both and more - but the point of my original post was that it is kinda strange, that production engines use for benchmarks problems from the domain that for the long time have been successfully dominated by constraint engines. It is like testing SUVs on the racetrack, or race cars off-road.

    Should not BRE folks come up with something more realistic, instructive and useful, for example, producing closures and finding contradictions for semantic web axioms - the area that is much more suitable for production engines?

  14. I must be stupid, because I don't get the point you're trying to make. The way I think about is this.

    The first part is defining the problem using a constraint language. That can be in prolog or some other language built for constraint programming.

    The second is compiling the constraint problem into executable form. The first is independent of the seond. The second is independent of the first.

    I can design a simple procedural language to define rules and compile them to run in a production system.

    I can also take a problem defined with a constraint language and execute it using a variety of algorithms.

    To classify production systems as data driven systems is out-dated and flat out wrong. OPSJ, Haley, JESS, JRules and Drools all provide query capabilities. Modern production systems often support forward and backward chaining. Also known as data driven and query driven. Dr. Charles forgy had an old article which stated backward chaining (aka query) engines must forward chain at some point.

    A query driven system is more restricted and limited than a forward chaining approach. Obviously, there's trade-offs with each approach. For example, LEAPS is data driven, but it uses lazy matching to avoid calculating cross product.

    For example JCHR http://sourceforge.net/projects/jchr a java constraint solver uses LEAPS algorithm.

    In terms of things like closure, there's no need to use a BRE. Just use prolog, or smalltalk. In terms of solving semantic web, that problem has nothing to do with BRE. Plus, the domain still needs a lot more basic research to reach a point where a practical implementation is feasible. Most of the W3C RDF semantic web projects out there have failed commercially and have only succeeded in the research environment. Those who know me, know I am very bias against the mess that is RDF.

    I think the definition of production systems needs to be updated to reflect current state of art. Modern BRE have made great advancements since the early 90's, so those dated definitions aren't accurate. As much as BRE have improved, there's still a ton of work needed to make things simpler for the end user. The learning curve is still extremely steep.