Monday, May 16, 2011

Guarding Drools's performance

Any new version of Drools Expert should be at least as fast as the previous version and preferably faster.

Now, that an easy statement to make, but a bit harder to formalize. Any one use case uses many features of the rule engine, such as not, exists, accumulate, ... Different use cases use different feature sets. As powerful new features and improvements are added, it becomes harder to get an overview on the performance of the rule engine. And all these features interact highly and affect the performance and scalability of the rule engine together.

Therefore micro benchmarks don't always suffice. So we've also been using the real-world examples of Drools Planner as macro benchmarks on Drools Expert. The results are interesting.
  • On average, these benchmarks have 1000+ facts, fire all rules 5000+ times per second and run up to 10 minutes each.
  • They all use the exact same Planner version, but a different Drools Expert version.
  • They are exactly reproducible, because they use a fixed Random seed: only the speed of the rule engine differs.
  • They use a stateful session and benefit (or suffer) greatly from incremental changes: you might or might not experience a similar performance effect in your use cases.
The graphs show how long it takes to reach a certain score. Lower is better.

Examination (exam scheduling)


From 5.0.1 (blue) to 5.1.1 (red) we see a huge performance gain (over twice as fast), as a result of the true modify improvements in 5.1, which were a big jump forward for stateful sessions.

From 5.1.1 (red) to 5.2.0.M2 (yellow) we see a huge performance drain (reported by Roman Novak), due to a small mistake in a fix for the accumulate implementation. Luckily, Edson located and fixed that mistake, so for 5.2.0.CR1 (green), the rule engine matches the 5.1.1 performance and even slightly improves it (hardly noticeable on the graph).

Traveling tournament problem (sport scheduling)

From 5.0.1 (blue) to 5.1.1 (red) we see a good performance gain. And from 5.1.1 to 5.2.0.CR1 (green) we see another good gain, thanks to Mark's Exists/Not improvements.

But notice the 5.2.0.M2 (yellow) point. The effect of the small accumulate mistake is quite drastic here. It's so slow, that it doesn't even reach the other scores.

Nurse rostering
There are no 5.0.1 (blue) points on this graph, because it was too slow. So the true modify improvement of 5.1 really changed the world for this example.

The effect of the accumulate fix is noticeable in 5.2.0.M2 (yellow). And the Exists/Not improvements result in another good performance gain from 5.1.1 (red) to 5.2.0.CR1 (green).

Summary

Different use cases have different performance stories, but a general trend is noticeable:
Drools Expert 5.2.0 will be at least as fast (examination) or faster (TTP, nurse rostering) than 5.1.1. And 5.1.1 was already faster (examination, TTP, nurse rostering) than 5.0.1.