Tuesday, June 30, 2009

Drools Video Series Released

My name is Ray Ploski. I've been a JBoss Solutions Architect for some time. It's a fantastic job - I get to play with all the technologies produced by JBoss, learn how our customers are using the technologies and to solve real world problems and then share these ideas with you. When I first started to learn more about how people were using Drools in combination with traditional Java development I was blown away with the power and simplicity the technology brought to even the most complex (or even simple) business use cases.

If you wonder "What is a rule engine?", "Why is this different/better to an 'if' statement", "Where do I get started?" or "How do I learn more?" I've released a series of videos to introduce you to the project and its technologies:

Introduction to Drools Expert:
Expert is the core rule engine central to the Drools project. In this video we'll walk through the concepts relating to Facts, Rules, Agendas and Working Memory as well as how one sets up and debugs a Drools project within Eclipse. Learn how to use audit trails. (Video in Hi-Res or Low-Res)

Introduction to Drools Guvnor:
Guvnor is a centralized repository with rich web-based graphical user interfaces to create and manage your logic. In this video we walk through features of searching, creating, testing and deploying knowledge based assests. (Video in Hi-Res or Low-Res)

Introduction to Drools Fusion:
Fusion is an extension to the Drools engine enables Complex Event Processing and Temporal Reasoning. In this video we walk through the concepts introduced with Fusion as well as walk through an example. (Video in Hi-Res or Low-Res)

Introduction to Drools Flow:
Flow provides workflow and process capabilities to Drools. The project tightly integrates rules with business process. In this video we walk through the tooling and concepts involved with flow including step-wise debugging. (Video in Hi-Res or Low-Res)

Hopefully you will share my enthusiasm and excitement of what you can accomplish with Drools. There will be more, short clips added soon on topics such as: Decision Tables, creating your own Work Items, What's new in Drools 5, Commands within Expert, Guvnor/Eclipse Integration. What else would you like to see?

Chatting on IRC - a reminder of web tools available

As people should know, the "virtual office" or "virtual bar" (depending on the time of day) where devs and users hang out is the IRC chat room generously provided by codehaus.org.

Server: irc.codehaus.org
Room: #drools

For those who don't want to or can't run IRC clients, don't despair, there are web interfaces. Codehaus has http://irc.codehaus.org, but I thought I would show a new ajaxy web one that I think works quite well:

Go to here and click on the "Server" link. Enter irc.codehaus.org as the server address, and #drools as a channel, and a short nick name (< 8 chars please !).

Sunday, June 28, 2009

Drools a reflection on 5 years.

5 years ago, when I first started to promote rule engines to the mainstream java developer market the questions I most often received where "What is a rule engine?" and "Why is this different/better to an 'if' statement". In a room of 25 developers maybe only 3 or 4 would have heard of Jess, JRules or Prolog and only 1 or 2 would have any actual experience.

It was a long hard slog of repeating the same information over and over and over and over again to get the message out.

5 years later and the picture is very different. I'm no longer asked what a rule engine is or having to explain the benefits and everyone has heard of Drools. My personally feeling is that 2009 has become the tipping point for Drools, our "coming of age" year.

Could we have had the October Rules Festival 5 years ago and filled the room with over 100 people, the bulk of which where Java developers? Could I have had a Boot Camp backed with A-list names such as Wells Fargo, Boing, Fedex, Lockheed Martin, Sony, HP, Sun.

Reflecting on this made me feel immensely proud of what the Drools team (Michael Neale, Edson Tirelli, Krisv Verlaenen, Toni Rikkola and in the early years Bob McWhirter) and myself had achieved. We where actually responsible for making a whole mainstream market for Rule Engine technology. No other OSS engine has had any real market penetration and the commercial engines still do not target the mainstream Java developers - i.e. you don't see JRules or Blaze Advisor at JavaOne or Devoxx(JavaPolis) or other similar events.

Drools was first established in 2001 by Bob McWhirter, there was no Drools 1.0 release. For those that remember the very early versions of Drools used Jelly (that xml scripting framework) and didn't even compile on windows without cygwin as it required bash shell scripts - Bob's handy work ;) A little later I got involved in Drools and together Bob, myself and others from the community finally managed to push out Drools 2.0, the first release, in June 2005, you can see the TSS announcement here . Drools 2.0 was a simple xml scripting language, that was a partial rete "like" impl.

It was at this point that I become project lead, replacing Bob McWhirter who by then had become interested in other things, although he still remained involed in Drools, just to a lesser extent. When I first became involved in Drools I had zero background in rule engine technology, although I had an AI background in search space technology, specifically genetic algorithms.

Exactly 1 year later Drools 3.0 was released in June 2006, TSS announcement here. Drools 3.0 was a fully Rete implementation aimed at the Jess market.

Just over 1 year from that in July 2007 Drools 4.0 was released, TSS announcement here. Drools 4.0 moved up the food chain and was aimed at the JRules BRE market.

It took two full years to finally release Drools 5.0, TSS announcement here. Drools 5.0 has no target market and innovates beyond what traditional rule engines do to become what we refer to as a Business Logic integration Platform (BLiP). Drools 5.0 integrates and unifies rules, workflow and event processing. Drools 5.0 also includes Drools Solver, which is lead by community member Geoffrey De Smet. Probably the only comparable system now is Tibco Business Events, which is going in a similar direction.

For a bit of fun I thought I'd paste an old IRC entry from my early days (my handle is conan) with Drools, when Bob McWhirter was my mentor - Unfortunately my entries prior to 2004 are lost :( This paste provides a comical reference to where I'd told my employers that an unreleased piece of software was stable "as granite" and production ready in my efforts to sell Drools, only to find out otherwise. Should hopefully be encouraging for people to see me in my more "clue free" days, showing that if i can do it, anyone can :)

[2004-02-09 19:23:38] <conan> supposed to be getting the rules running at cisco this week, if drools is broken - I'm going to have a serious confidence problem with management.
[2004-02-09 19:24:16] <topping> yes, caveats are good when pushing unreleased software to managment :-)
[2004-02-09 19:25:03] <conan> topping: yeah I've been telling them its stable as granite!!!
[2004-02-09 19:52:12] <conan> I'm thinking it might just be easier to stick in as an example for now in drools-examples
[2004-02-09 19:52:21] <topping> i dunno, what problem are you having?
[2004-02-09 19:53:29] <conan> I add two "request" objects which have states. on reset even which is fired when one request state = "Q" and any other request state != "N" can end up with request1 and requet2 being the same.
[2004-02-09 19:53:35] <conan> which is fine
[2004-02-09 19:53:53] <conan> I then retract the object, but its seems to recurse around still, even though there should be no data.
2004-02-09 19:56:05] <bob> howdy
[2004-02-09 19:56:16] <bob> if you've got a rule firing against a previously retracted object, then definitely a drools bug
[2004-02-09 19:56:20] <bob> probably in the Agenda management
[2004-02-09 19:56:39] <bob> Agenda isn't dropping rule activations that involve retracted objects
[2004-02-09 19:56:42] <bob> (just guessing)
[2004-02-09 19:57:10] <conan> could this be beause the object is referenced by two parameters?
[2004-02-09 19:57:13] <bob> nope
[2004-02-09 19:57:22] <bob> an object either is or is-not in the working-memory
[2004-02-09 19:57:31] <bob> if you take it out of the memory and it's still in a rule activation, then bug
[2004-02-09 19:57:44] <conan> bob: I'm going to knock up an example then and probe this.
[2004-02-09 19:57:51] <bob> entirely possible I broke this in beta-12
[2004-02-09 19:58:04] <bob> bad idea saying drools is "stable as granite" :)
[2004-02-09 19:58:13] <conan> bob: yeah I know :)

Things would be remiss if I didn't take this opportunity to thank many of the wonderful community contributors (in no particular order beyond old and new school) that helped make Drools what it is. Please if I missed off your name, then do let me know and I'll add it.

Old School:

Alexander Saint Croix (nalex), Thomas Deisler, Doug Bryant (doug), Brain Topping (topping), Peter Royal (proyal), Simon Harris (sharris), Peter Lin (woolfel), David Cramer, Roger F. Gay, Barry Kaplan (memelet), Andy Barnett (dbarnett), Matt Ho (savaki), Martin Hold (mhald), Pete Kazmier (kaz), Alexander Bagerman (bagerman), Michael Frandsen.

New School:
David Sinclair (stampy88), Ming Jin (ming), Ellen Zhao (ellen), Ben Truit, Wolfgang Laun (Laune) , Matthias Groch, Matt Geis, Joe White (joe), Michael Rhoden (mrhoden), Geoffrey De Smet (Ge0ffrey), Alexandre Porcelli (porcelli), Ahti Kitsik (Ahti), Tihomir Surdilovic, Salatino Mauricio (salaboy), Davide Sottara (sotty).

Thursday, June 25, 2009

Drools Flow performance

People sometimes ask for tests, benchmarks or numbers that they can use for evaluating whether Drools actually is fast enough. Fast enough always depends on your specific case. We've had various blogs before on performance for the rules engine itself, but so far we have never published anything for Drools Flow.

However, not publishing can sometimes lead to confusion as well (as for example here, where Drools Flow was used as one candidate in a performance evaluation and we at first sight only seemed a fraction faster, but it's difficult to actually figure out what the exact results were). That's why I will post some figures here anyway, simply as some kind of reference, to determine the kind of overhead the engine creates during the execution of your processes.

The test we're using here is actually a very simple one: we simply start an empty process (a start and end node connected to each other) and execute that 10.000x in sequence and measure the avg time it takes to execute that process. These results of course heavily depend on how you configure your engine and we will show these results in three different settings:

A. Simple POJO execution: The Drools engine is used as a simple local Java component (so without any persistence or transactions)

B. Persistence / transactions: The same process is executed but in a transactional context (a new transaction for each process instance), and the state of the engine is always persisted in the database.

C. Optimized Java mode: This is actually one of my pet side-projects, where we translate the Drools Flow process straight into Java code and execute that Java code for you (the client simply needs to change on simple configuration for the process). While this severely limits the types of nodes you're allowed to use in your process (no wait states for example), and reduces the flexibility of your process, it shows how we can make Drools Flow lightning fast (in specific circumstances) if necessary. And it is of course a good reference for showing what the limit is ;) This is again without persistence and transactions.

Results [using IBM ThinkPad T61 laptop running RHEL, Java 1.6]

A. Simple:
388ms -> 0.04ms / process instance
B. Persistence / transactions: 21.9s -> 2ms / process instance
C. Optimized Java: 126ms -> 0.01ms / process instance

If you're using the engine itself without any persistence or transactions (those are added as orthogonal layers, not part of the core itself), we think it's pretty fast :)

As you can see, there's a certain price you have to pay for adding persistence and transactions. But since simply opening a JPA session and persisting one object in a transaction takes about 1.5ms here as well (75% of the total time), we believe we probably do limit the additional overhead.

The optimized Java mode shows that, if you really need to, you can still get about 4x performance increase by generating Java code from the process description. We hope to get this included into the code base at some time, and maybe even provide this functionality to certain parts of your process.

If these numbers are insufficient, you'll still be able to start looking at executing commands in parallel (they were all executed in sequence now), using multiple session to split up the work, etc.

For those who want to verify themselves, the actual test code can be found here.

Complex Logic Formulas #3

In my previous two posts I introduced configurable operators, but maybe some of you noticed that I (willingly) left one out: the negation NOT.

In Drools, negation appears essentially in three places, with slightly different semantics:

  • relational evaluators have their negated counterpart:
    Person( age == 18 , age != 18 )
  • custom evaluators support the "not" prefix keyword:
    Person( age not ~old )
  • the (non) existential quantifier is allowed:
    not Person( age < 18)

In particular, the second and the third case have diffrent meanings.

The negation in "age not old" is a logical negation: it takes the result of the evaluator (be it a boolean or a generalized degree) and inverts it, mapping true to false and vice versa. The quantifier "not", instead, models the condition "when there is NO object matching the pattern...".

In fact, logicians use the term logical negation in the former, and negation as failure in the latter. Due to this ambiguity, in Drools Chance not is still supported with the usual, context-dependent semantics, but is deprecated. Instead, the two operators neg and naf are proposed.

neg can be used both before evaluators and between-pattern evaluators, and can be nested:

rule "NEG"
neg neg neg ( // equivalent to a single neg
$p : Person( age neg ~young )
Car( owner == $p, price neg ~low )
// this rule activates for each pair Person/Car in which
// either the owner is young or their car is expensive

naf, instead, must be used before a sub-formula:

rule "NAF"
naf Person( age < 18 )
// this rule will activate if there are no people who aren't young
// (i.e. all "Person"s are old)

Notice that in Drools the two are connected by the relation naf <-> neg exists.

Complex Logic Formulas #2

In the last post, I introduced logic operators in boolean rules. Drools, in its standard form, supports AND and, in a limited way, OR. In fact, these operators are sufficient to write a number of rules. The addition of the other common logic operators (XOR, EQUIV,IMPLY) is more syntactic sugar than a real valuable feature - in fact, no rule engine supports them openly.

In presence of imperfection, instead, much changes. A degree is more than a simple boolean and thus carries additional information that can be combined in complex ways. Let's take the conjuction AND as an example. The common, general idea is that the result should tend to true (whatever true means) the more all the operands tend to true individually.
In practice this is a vague constraint that leaves many degrees of freedom.

Consider the basic case: imperfection is used to model fuzziness, and real numbers are used as degrees. This is perhaps the simplest case, since operators are truth-functional (i.e. they just require the degrees of their operands to be evaluated) and degrees themselves are extremely simple.
The logic conjunction of two degrees can be obtained by taking their minimum:

1) d(A && B) = min( d(A) , d(B) )

but also their product:

2) d(A && B) = d(A) * d(B)

or again:

3) d(A && B) = max( 0 , d(A) + d(B) -1 )

These operations (technically called t-norms) are but some - fundamental - examples of a whole family of operations, all of which are candidate implementation of the AND operator.

Things do not improve much if one chooses different types of imperfection: take, for example, probability. Supposing, again, that real values model the probability of truth (thus putting ourselves in the simplest probabilistic case), AND can be implemented by taking the product of the operand probabilities - BUT only if the operands are conditionally independent. If that is not the case, the operator will not be truth-functional and thus will have to perform more complicated calculations, possibly argument-dependent:

4) p(A && B) = p(A) * p(B|A)

Similar concepts apply to the other operators. An operator, then, is actually an abstract construct that can be customized and configured. To do so, attributes can be attached to each individual operator, choosing one or more among the following:

  • id : an identifier which can be used to reference the operator
  • kind : a string selecting a specific implementation of the operator
  • args : a string containing additional information required to configure the operator

Perhaps the best way to understand them is to imagine the following call:

ID = new OperatorKind(Args)

In fact, a centralized factory is used to instantiate the operators during ther construction of the RETE network: it uses the value of kind to choose the concrete classes and args to provide arguments to the constructors. Obviously, the factory can be configured with a default type to return if no kind is specified explicitly.

These attributes can be attached to operators, both within and between Patterns, and also to pattern themselves. The reason is simple: a pattern
Type ( constraints )
is transformed into the conjuction
object.class == Type && constraints
so the attributes are attached to the hidden conjunction.

The exact syntax is shown in the following example (attributes are optional) :

rule "Annotated_Ops_Example"
field1 == "a"
op_within @( id="..." kind="..." args="..." )
field2 == "b"

op_between @( id="..." )

field3 == "c"
) @( kind="..." args="..." )


The symbol "@" is used to introduce the metadata between the brackets, which, in the specific case, are given by the pairs attribute/value.

Wednesday, June 24, 2009

One model to rule them all... and in the darkness bind them

It seems to be a bit of a holy grail of Service Oriented Architecture, or in fact for any large organisation, to have a single canonical model/form of all important entities to their business.

After watching a colleague struggle with a 900K WSDL that defines something like that (900K of XML !), I happened to stumble across this interesting and amusing blog post: http://service-architecture.blogspot.com/2009/06/single-canonical-form-only-suicidal.html

If SaaS is anywhere in your future, and it will be unless you are a military secure establishment and even then it might me, then GIVE UP NOW on the idea that you can mandate data standards in applications and create a single great big view that represents everything.

I have watched organisations spend millions even trying to define what the most "basic" entity looks like: A Customer ! Its hard to even agree on the basics !

So what ends up happening is that some sort of a standard is reached, and projects have to pay an expensive "architecture tax" to use these huge models, fail, and then feel guilty for creating their own little models that at least allow them to build their app.

With modern mapping tech, such as this or this, the cost of mapping between models is much much less, perhaps its less then the tax of using huge complex models? (this is directly relevant to the models that rules use: rules *can* use the canonical models, depending on how complex you want them to be, but sometimes it is clearer to use a model tailored for where it is used, and map to it from the external model).

Tuesday, June 23, 2009

Benchmarking drools-solver configurations

One of the little known gems in Drools Solver is the Benchmarker utility. Until now, it wasn't documented in the manual and few people knew about it.

The Benchmarker allows you to play out different solver configurations against each other, so you can determine the best one for your problem domain. It's pretty easy to use:

XmlSolverBenchmarker benchmarker = new XmlSolverBenchmarker();

This benchmark configuration will run 3 different solvers on 2 datasets, so it will do 6 solver runs:

<?xml version="1.0" encoding="UTF-8"?>



Afterwards, it will sort the solvers and write their results to the resultFile.

Monday, June 22, 2009

Refactoring: Google Summer of Code progress update

Thanks to the Google Summer of Code programme, we have Lucas Amador working on refactoring support for rules.

Initially the focus is on the Eclipse plug in (of course) - as Eclipse has excellent support for refactoring tools that are not java source specific - but the intent is to have the feature available to other tools (such as Guvnor, or perhaps other IDEs in future) for bulk refactoring.

More info is on the wiki page here.

Lucas also made a video demonstrating some of the enhancements here:

GSoC 2009 - Drools Refactoring from Lucas A. on Vimeo.

For those really really interested, and crazy, the code as a work in progress is available from here.

Thursday, June 18, 2009

Drools Blog gains over 1000 subscribers

Google Feedburner, which filters out bots etc, reports that we have just gained over 1000 subscribers :) Small numbers in hibernate terms, but great when you consider the nieche market of rules, hopefully fusion and flow are helping us reach new readers. Additionally the website itself gets an average of 500 unique viewers per day and has peaked at over 1000 unique viewers per day - this doesn't include viewings from the various syndicated locations, such as jboss.org.

Viva Le Drools!!!

Pre-installed Drools development environment for VirtualBox

For the Drools Boot Camp SF09 I made up pre-installed Drools development environments for Sun's VirtualBox. This environment has everything you need to start development of drools.

What's in the box?

  • Fedora 10
  • Java 1.6 JDK
  • Maven 2.0.9
  • Ant 1.7.1
  • gwt-linux-1.5.2
  • JProfiler 5.2 (With Drools community license, only to be used with Drools)
  • Eclipse 3.4 (with GEF, Subclipse, Drools and JProfiler plugins pre-installed, and Drools runtime configured)
  • All environment variables correctly set
  • Full Drools SVN checkout
  • Maven repository populated with all Drools dependencies
  • Initial Eclipse workspace created, with basic modules already checked in
  • Documentation already built and set as FireFox home page
  • Full build already done, including the eclipse plugin which downloads eclipse to build itself, and JBoss AS for the guvnor-standalone.zip which is in drools-guvnor.
The VirtualBox 2.2 image is exported and zipped and available here:
https://docs.jboss.org/drools/virtualbox/dvbox-20090615.zip (2.6GB)

username : repoman
password : password

If anyone wants to improve on this image - preinstalled Netbeans, IntelliJ (with community license) or any other ideas, please feel free and let me know where I can download it from so I can make it available to others.

Tuesday, June 16, 2009

Business Knowledge Life Cycle

When talking about integration and unification of business processes, rules and event processing, people kinda understand quite rapidly what we are talking about at the runtime level (just imagine making business decisions inside your process as the most simple case). However, when we start talking about unifying or harmonization of the entire life cycle, it sometimes becomes more difficult to explain.

Over the last few days, I've been trying to find a simple figure that shows the most important phases in the life cycle of your business knowledge. You know, starting from design, followed by deployment, execution and monitoring, after which the results are analyzed, leading to a continuous optimization loop. Various research papers and industry white papers in the BPM area have their own definition (and I'm not allowed to just steal those ;)). So I decided to combine the most important steps, not to create the most complete life cycle that everyone should use as a reference (I'll leave that to the core researchers), but just to show some of the most important phases.

Now, if you take a look at the figure below, you'll notice that these phases are not only applicable for process, but for rules as well.

Business Knowledge Life Cycle
(Click on the image for a larger version)

The point that I'm trying to make is that, from the perspective of the end user, all these stages are pretty similar, whether you are using processes, rules or event rules to describe your business knowledge. The underlying technologies to achieve those steps might be completely different (for example rule analysis and process analysis are based on completely different theories), from the perspective of the end user, they usually behave pretty similar (in this case, warnings and errors about possible problems in your knowledge).

That is why we believe that you shouldn't be integrating a process-oriented, a rules-oriented and an event-oriented product and burden the end user with having to manage a completely different solution for each stage in the life cycle. We believe in having APIs and tooling that already unifies or harmonizes all these stages for you (and at the same time provides advanced integration), so the end user can simple select the most appropriate paradigm to specify their business knowledge (and combine if necessary).

Monday, June 15, 2009

The Benefits of (Automated) Testing

A quick post to point to this article I was reading today. Nothing ground-breaking in there, but a fair statement of testing goals at the several levels of a project.

My personal opinion is that does not matter if you use TDD principles or not: software needs testing! And the lower levels of testing must be automated. It is as simple as that.

This is a common mantra in Open Source projects like Drools, specially because you have many many people coming and going, contributing a feature here or a patch there. Although, that is not always the case in closed source projects, specially inside companies (and I worked in several of these projects in the past).

IMO, the most overlooked benefit of testing is the long term stability of the project. Just to mention one example, it is not necessary to achieve the 90% code coverage in unit tests right from the start. As a developer, write tests that cover the boundaries and common use cases of the code you are creating and let it rest in your test suite. Whenever a bug is found, make sure to add more unit tests that cover exactly the bug and the fix you implemented. This way, you will not overburden the project development time line writing "too many tests", but will have your project mature and stabilize nicely and incrementaly.

Drools uses unit and integration testing extensively. One of the latest builds of version 5 on our CI server shows that 2814 unit and integration tests were executed. Code coverage tools usually don't deal well with code and bytecode generation, so it is hard to state exactly how much code we cover, but the important thing is that we are confident enough in them to make changes to the code and having tests to fail if we do something wrong. Something extremely important when you have a software that evolves constantly and heads to directions that are results of all the reasearching we do, but that we could not imagine we would go if someone asked us 2 or 3 years ago.

Thursday, June 11, 2009

Seam-Drools integration

Our friend Tihomir Surdilovic just published a few new features in the Drools and Seam integration. I will reproduce his post for those that missed.

Good job Tihomir!

We added a couple of new features to the seam-drools integration:

1. Support for Decision Tables (JBSEAM-4188):

<drools:rule-base name="ruleBase" rule-files="myDecisionTable.xls"/>

You can now compile rules from your Decision Table (xls) when using the drools:rule-base component in components.xml

2. Support for Drools Custom Consequence Exception Handlers (JBSEAM-4049):

<drools:rule-base name="ruleBase" rule-files="myDecisionTable.xls" consequence-exception-handler="#{myConsequenceExceptionHandler}"/>

You can use the new consequence-exception-handler attribute to add a new exception handler. This exception handler can be a Seam component and can look like:

public class MyConsequenceExceptionHandler implements ConsequenceExceptionHandler, Externalizable {

public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException {

public void writeExternal(ObjectOutput out) throws IOException {

public void handleException(Activation activation,
WorkingMemory workingMemory,
Exception exception) {
throw new ConsequenceException( exception,
activation.getRule() );


3. Support for Drools Event Listeners (JBSEAM-4225):

<drools:managed-working-memory name="workingMemory" rule-base="#{ruleBase}">

This allows you to be notified of rule engine events, including rules firing, objects being asserted, etc by simply adding the drools:event-listeners element to your managed-working-memory component in components.xml as shown in the example

These updates will be available in Seam 2.2.0.CR1. We are also working on adding RuleFlow support in the near future.

Complex logic formulas #1

Dear all, after some days spent between papers, code and travels, I am back with a new post! Today, the topic is operators.

Look at a simple rule:

$p : Person ( name == "sotty" , age >= 27 , )
$m : Message ( sender == $p , length < 160 , $b : body)
System.out.println( $b );

This a "trivial" logical expression : all the constraints, including the Type checks, are AND-ed together: a tuple [person, message] must satisfy them all to generate an activation. Even in boolean logic, however, it is possible to define more complex formulas, using logical connectives.
These operators are AND (all constraints must be true), OR (at least one constraint must be true), XOR (only one constraint must be true) and IMPLY (the conclusion must be true when the premise is true). Moreover, there is the negation NEG, which reverses the result of its operand.

Those of you familiar with logic could object that it is not strictly necessary to have all operators as primitives, since they can be mutually defined. For example NEG and IMPLY (or NEG and AND) are sufficient to define all others: (A OR B) is equivalent to ((A IMPLY B) IMPLY B), or to NEG((NEG A) AND (NEG B)) and so on...

However, the conversion is verbose. The resulting formula is much less clear and understandable than the original one (and tends to be more expensive to evaluate) so operators should be available as primitive, at least to write rules.

Operators can still be used both within patterns: the usual "&&" and "||" have been extended with "^^" (XOR) , "->" (IMPLY) and "<->" (EQUIV) :

rule "In-Pattern"
$t : Tea ( lemon == true ^^ milk == true )

but also between patterns, where operators are denoted using a textual form (or, and, imples, equiv and xor) instead of a symbolic one. An example:

rule "Between-Pattern"
$c : Car( $o : owner, miles ~few || price ~low )
$b : Bike( owner == $o )

Normally, this rule would be split in two separate rules, but this is not the case in Chance.
The patterns are matched and evaluated, as usual, but then the operator, OR in the example, will combine the resulting degrees.
This actually open several questions, to which detailed answers will be given in future posts; but for now, just to give a hint:

what is the exact semantics of an operator such as OR ?
according to the spirit of Chance, it can be customized...

what does an operator such as OR actually operate on ?
operators may be truth-functional, i.e. just operate on the degrees resulting from evaluating the operands, or not, requiring additional information...

but what is the point of using and operator such as OR if the operands are automatically true?
the operands are not automatically true. Nor false. There's imperfection...

While a tuple is being built, the various degrees resulting from the various evaluations
are collected and organized in a tree structure, called "Evaluation", mirroring the structure of the formula. Something like:

class == Car
miles ~few
price ~low
class == Bike
owner == $o

So, regarding question #2: a truth-functional operator requires only the degrees of its operands to be evaluated; a semi-truth functional operator may require a deeper visit of the tree structure; a general, non-truth functional operator requires even more additional information, such as data from the tuple (the arguments) being evaluated.

To conclude this post, consider the final example:

rule "Not yet Socrates"
$p : Person( $n : name )
Person( this == $p , this ~mortal )

This rule somewhat recalls a famous example from the history of logics, Socrates' famous syllogism. In time I'll show how to turn it fully in the rule "All men are mortal", but for now some building blocks are still missing: quantifiers, induction and Modus Ponens among them.
Upcoming posts, instead, will deal with negation and operator customization. In which order, I'm not certain.

Wednesday, June 10, 2009

How to implement Accumulate Functions

Developing solutions for problems is not an easy task, specially when the tools we got to solve a particular problem are good enough for that 80% part of the task, but fail to enable us to solve that remaining 20%.

Drools is built from scratch with extensibility in mind and this is one of the distinguishing characteristics from it to other products in the market. From support to higher level abstractions, like Domain Specific Languages and Decision Tables, to engine extensions like pluggable evaluators and functions, Drools enables the technical people to make business people feel more comfortable while writing rules, using a known vocabulary, constraints and abstractions.

In my talk during the October Rules Fest I will dive into all the ways in that Drools can be extended to improve the development of domain specific solutions. For now, I just want to throw some bones while saving the meat for the conference.

In this spirit I would like to show you one of the easiest ways to extend the engine: Accumulate Functions.

It is quite common the need for rules to execute operations on sets of data. The operations range from actual set operations, to calculation/scoring, to whatever you need that is executed on a set of facts. Drools accumulate CE supports inline custom code in its init/action/reverse/result blocks of code, but that is not declarative, nor is reusable among multiple rules and it is good only for a one-time need.

Accumulate Functions to the rescue: implementing an accumulate function is a 20 minutes task. It makes all your rules easier to write, read and maintain. It is unit test friendly and Drools Eclipse plugin understands and validates your rules with accumulate functions.

Lets look at an example scenario so that everyone understands what accumulate functions are. Imagine that you have a rule that needs to calculate the sum of the price of all products. Without accumulate functions, the rule would look like:

rule "Sum all products"
$total : Number() from accumulate(
Product( $p : price ),
init( double total = 0; ),
action( total += $p; ),
reverse( total -= $p; ),
result( new Double( total ) ) )
// do something

As you can see, even for a very simple case it is quite verbose. More than that, if another rule needs to calculate the sum of something else, you need to rewrite all the code, what makes maintenance very difficult.
With Accumulate Functions, things get much nicer:

rule "Sum all items"
$total : Number() from accumulate(
Product( $p : price ),
sum( $p ) )
// do something

Now the intent of the rule is explicit. It is much shorter and less error prone. Drools ships with several accumulate functions that are available out of the box, like sum, average, min, max, count, collectSet and collectList.

Now imagine that your application needs a set operation. How hard is it to implement it as an accumulate function? As I mentioned before, so hard that you can have it done in 20 minutes and then re-use it everywhere. Imagine complex financial interest calculations, or streaming processing functions, or monitoring correlations... all these can be implemented as an accumulate function and re-used by every rules author in your company.

For this example here, I will implement something simple, but very unusual with the goal of, hopefully, opening the minds of the readers. Imagine there is a store business that has a marketing promotion that says: "if the customer order is above $100, the customer is entitled to a gift that is randomly chosen among a list of available gifts". How would you implement that? Exactly:

The randomSelect Accumulate Function

A rule that uses our randomSelect accumulate function looks like this:

rule "Give a gift to the customer if order total is more than $100"
$order : Order( total > 100 )
$gift : Gift( ) from accumulate(
$i : Gift( available == true ),
randomSelect( $i ) )
$order.add( $gift );

To implement an Accumulate Function, all that is necessary is to implement the org.drools.runtime.rule.AccumulateFunction interface.

* An accumulate function that random selects one object from a list of them
* @author etirelli
public class RandomSelectAccumulateFunction
AccumulateFunction {

// bellow methods and the static inner class will be inserted here


Drools is designed to enable sharing of the KnowledgeBase among multiple sessions. This way, an accumulate function can not contain any attribute/data that is specific to a single session or rule. Any data specific to a rule is stored in a "context" object. The context object can be an instance of any class. It is instantiated by the createContext() method. So, lets say we have a RandomSelectData class that will store all the context data for us. The method will look like:

public Serializable createContext() {
return new RandomSelectData();

As we can see from the method signature, our data class needs to be Serializable. So, lets create a private static inner class to use as data store:

* A private static class to hold all the rule specific data for the random select function
private static class RandomSelectData
Serializable {
// the list of objects to chose from
public List<Object> list = new ArrayList<Object>();
// a random number generator
public transient Random random = new Random(System.currentTimeMillis());

Since the class is private we will just keep the attributes public for ease of use.
Now we need to implement all other methods from the AccumulateFunction interface. The first method is the init() method, that is called every time a new calculation is started. In this case, we will just clear the list of available objects:

public void init(Serializable context) throws Exception {
RandomSelectData data = (RandomSelectData) context;

The second method is the accumulate() method that is called every time a new object is added to the calculation process. In this case, all we want to do is add the object to the list of available objects:

public void accumulate(Serializable context,
Object value) {
RandomSelectData data = (RandomSelectData) context;
data.list.add( value );

The third method is the reverse() method that is called every time an object is removed from the calculation, i.e., should no longer be used to achieve the results. This method is optional, but implementing it improves the performance of the function as not only additions are incrementally calculated, but also removals.

public void reverse(Serializable context,
Object value) throws Exception {
RandomSelectData data = (RandomSelectData) context;
data.list.remove( value );

The fourth method tells the engine if your functions supports (implements) the reverse method above. Since we did implemented it, we will just return true.

public boolean supportsReverse() {
return true;

And finally, the fifth method is getResult(), that must return the result of the calculation for the current set of data. In our case, we will just randomly pick one element from the available list of elements:

public Object getResult(Serializable context) throws Exception {
RandomSelectData data = (RandomSelectData) context;
return data.list.get( data.random.nextInt( data.list.size() ) );

An AccumulateFunction is Externalizable, so we must also implement the read/writeExternal() methods. In most cases, this methods will be empty, but if the function contains any attribute that are shared among sessions, they should be serialized here.

public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException {
public void writeExternal(ObjectOutput out) throws IOException {

And that is it! Our function is implemented. I will not show the unit test here, as this post is already huge, but you can see that since the class is completely self-contained, implementing the test for the methods is a piece of cake.

The last step is to make the function available to the rules engine. Again, there are several ways of doing that. My preferred way is to create a configuration file in the classpath with the following path and name:


Using the configuration file allows the eclipse plugin to discover and support the function in the rules. The file is a regular property file, and to configure the function you need to use the following format:

drools.accumulate.function.<identifier> = <fully-qualified class name>

Drools will link the function implementation to the <identifier> above and allow its use in the rules. In our case the configuration would be:

drools.accumulate.function.randomSelect = org.drools.examples.lotrc.functions.RandomSelectAccumulateFunction

Other options to configure accumulate functions are through the API, using the KnowledgeBuilderConfiguration class or setting a system property, but in these cases, the Eclipse plugin will not automatically understand your accumulate function.

Happy Drooling,

Saturday, June 06, 2009

October Rules Fest 2009

Just a reminder for the readers, it is time to register for the best technical conference on Business Rules related technologies out there:

The October Rules Fest 2009

For those that missed the 2008 conference, this is *the* technical conference on the subject and gathers in the same place to present their work the minds that have been defining this field in the last 30 years. Yes, we are talking about Dr Charles Forgy himself, Gary Riley and the whole lot of researchers and professionals both from the industry as well as academy for, as James like to say, a conference with "no fluff, just stuff".

The focus of this conference is strictly technical, so expect to see a lot of applied research, industry solutions and don't be afraid of code. It is like a full week broad tutorial on the subject.

Drools will be there too, with four presentations:

  • Production Rule Systems - Where do we go from here? (Mark Proctor)
  • Distributed Programming with Rule Engines (Mark Proctor)
  • Extending General Purpose Engines with Domain Specific Resources (Edson Tirelli)
  • Temporal Reasoning: a requirement for CEP (Edson Tirelli)

Hope to see you all there!


Wednesday, June 03, 2009

Drools roadmap meeting (for those in SF)....

For those that are in SF, we are having a roadmap meeting tonight at 6.45 at the Hilton (financial district)....