Wednesday, June 10, 2009

How to implement Accumulate Functions

Developing solutions for problems is not an easy task, specially when the tools we got to solve a particular problem are good enough for that 80% part of the task, but fail to enable us to solve that remaining 20%.

Drools is built from scratch with extensibility in mind and this is one of the distinguishing characteristics from it to other products in the market. From support to higher level abstractions, like Domain Specific Languages and Decision Tables, to engine extensions like pluggable evaluators and functions, Drools enables the technical people to make business people feel more comfortable while writing rules, using a known vocabulary, constraints and abstractions.

In my talk during the October Rules Fest I will dive into all the ways in that Drools can be extended to improve the development of domain specific solutions. For now, I just want to throw some bones while saving the meat for the conference.

In this spirit I would like to show you one of the easiest ways to extend the engine: Accumulate Functions.

It is quite common the need for rules to execute operations on sets of data. The operations range from actual set operations, to calculation/scoring, to whatever you need that is executed on a set of facts. Drools accumulate CE supports inline custom code in its init/action/reverse/result blocks of code, but that is not declarative, nor is reusable among multiple rules and it is good only for a one-time need.

Accumulate Functions to the rescue: implementing an accumulate function is a 20 minutes task. It makes all your rules easier to write, read and maintain. It is unit test friendly and Drools Eclipse plugin understands and validates your rules with accumulate functions.

Lets look at an example scenario so that everyone understands what accumulate functions are. Imagine that you have a rule that needs to calculate the sum of the price of all products. Without accumulate functions, the rule would look like:

rule "Sum all products"
when
$total : Number() from accumulate(
Product( $p : price ),
init( double total = 0; ),
action( total += $p; ),
reverse( total -= $p; ),
result( new Double( total ) ) )
then
// do something
end

As you can see, even for a very simple case it is quite verbose. More than that, if another rule needs to calculate the sum of something else, you need to rewrite all the code, what makes maintenance very difficult.
With Accumulate Functions, things get much nicer:

rule "Sum all items"
when
$total : Number() from accumulate(
Product( $p : price ),
sum( $p ) )
then
// do something
end

Now the intent of the rule is explicit. It is much shorter and less error prone. Drools ships with several accumulate functions that are available out of the box, like sum, average, min, max, count, collectSet and collectList.

Now imagine that your application needs a set operation. How hard is it to implement it as an accumulate function? As I mentioned before, so hard that you can have it done in 20 minutes and then re-use it everywhere. Imagine complex financial interest calculations, or streaming processing functions, or monitoring correlations... all these can be implemented as an accumulate function and re-used by every rules author in your company.

For this example here, I will implement something simple, but very unusual with the goal of, hopefully, opening the minds of the readers. Imagine there is a store business that has a marketing promotion that says: "if the customer order is above $100, the customer is entitled to a gift that is randomly chosen among a list of available gifts". How would you implement that? Exactly:

The randomSelect Accumulate Function


A rule that uses our randomSelect accumulate function looks like this:

rule "Give a gift to the customer if order total is more than $100"
when
$order : Order( total > 100 )
$gift : Gift( ) from accumulate(
$i : Gift( available == true ),
randomSelect( $i ) )
then
$order.add( $gift );
end

To implement an Accumulate Function, all that is necessary is to implement the org.drools.runtime.rule.AccumulateFunction interface.

/**
* An accumulate function that random selects one object from a list of them
*
* @author etirelli
*/
public class RandomSelectAccumulateFunction
implements
AccumulateFunction {

// bellow methods and the static inner class will be inserted here

}

Drools is designed to enable sharing of the KnowledgeBase among multiple sessions. This way, an accumulate function can not contain any attribute/data that is specific to a single session or rule. Any data specific to a rule is stored in a "context" object. The context object can be an instance of any class. It is instantiated by the createContext() method. So, lets say we have a RandomSelectData class that will store all the context data for us. The method will look like:

public Serializable createContext() {
return new RandomSelectData();
}

As we can see from the method signature, our data class needs to be Serializable. So, lets create a private static inner class to use as data store:

/**
* A private static class to hold all the rule specific data for the random select function
*/
private static class RandomSelectData
implements
Serializable {
// the list of objects to chose from
public List<Object> list = new ArrayList<Object>();
// a random number generator
public transient Random random = new Random(System.currentTimeMillis());
}

Since the class is private we will just keep the attributes public for ease of use.
Now we need to implement all other methods from the AccumulateFunction interface. The first method is the init() method, that is called every time a new calculation is started. In this case, we will just clear the list of available objects:

public void init(Serializable context) throws Exception {
RandomSelectData data = (RandomSelectData) context;
data.list.clear();
}

The second method is the accumulate() method that is called every time a new object is added to the calculation process. In this case, all we want to do is add the object to the list of available objects:

public void accumulate(Serializable context,
Object value) {
RandomSelectData data = (RandomSelectData) context;
data.list.add( value );
}

The third method is the reverse() method that is called every time an object is removed from the calculation, i.e., should no longer be used to achieve the results. This method is optional, but implementing it improves the performance of the function as not only additions are incrementally calculated, but also removals.

public void reverse(Serializable context,
Object value) throws Exception {
RandomSelectData data = (RandomSelectData) context;
data.list.remove( value );
}

The fourth method tells the engine if your functions supports (implements) the reverse method above. Since we did implemented it, we will just return true.

public boolean supportsReverse() {
return true;
}

And finally, the fifth method is getResult(), that must return the result of the calculation for the current set of data. In our case, we will just randomly pick one element from the available list of elements:

public Object getResult(Serializable context) throws Exception {
RandomSelectData data = (RandomSelectData) context;
return data.list.get( data.random.nextInt( data.list.size() ) );
}

An AccumulateFunction is Externalizable, so we must also implement the read/writeExternal() methods. In most cases, this methods will be empty, but if the function contains any attribute that are shared among sessions, they should be serialized here.

public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException {
}
public void writeExternal(ObjectOutput out) throws IOException {
}

And that is it! Our function is implemented. I will not show the unit test here, as this post is already huge, but you can see that since the class is completely self-contained, implementing the test for the methods is a piece of cake.

The last step is to make the function available to the rules engine. Again, there are several ways of doing that. My preferred way is to create a configuration file in the classpath with the following path and name:

META-INF/drools.packagebuilder.conf

Using the configuration file allows the eclipse plugin to discover and support the function in the rules. The file is a regular property file, and to configure the function you need to use the following format:

drools.accumulate.function.<identifier> = <fully-qualified class name>

Drools will link the function implementation to the <identifier> above and allow its use in the rules. In our case the configuration would be:

drools.accumulate.function.randomSelect = org.drools.examples.lotrc.functions.RandomSelectAccumulateFunction

Other options to configure accumulate functions are through the API, using the KnowledgeBuilderConfiguration class or setting a system property, but in these cases, the Eclipse plugin will not automatically understand your accumulate function.

Happy Drooling,
Edson

4 comments:

  1. Very interesting and useful, thanks for the post!

    ReplyDelete
  2. Thanks a lot for the code.



    How should the accumulate function react if there is no match?

    The getResult() function is called even if accumulate() was never called. But then the list is empty and a NullPointerException is thrown.

    ReplyDelete
  3. createContext() is executed once for each matching tuple to create the data structures and context informantion. Then, init() is always called and is supposed to initialize any values requiring initialization.

    So, in case of no matches, getResult() is supposed to return values based on the initial values set by init().

    The example, unfortunately, does not take into account the case where the list is empty. Requires fixing.

    ReplyDelete
  4. I'm new to Drools and have just coded my first accumulate function. I notice that that getResult() is called multiple times. The pattern is basically:

    On ACTIVATION CREATED
    init()
    getResults()

    On *each* OBJECT ASSERTED (where the fact object matches the accumulate condition)
    accumulate()
    getResults()

    So if I insert three facts, getResults() is called four times.

    My function can cope with that, and I think I can imagine why it happens (because the LHS is evaluated multiple times), but are there any techniques for avoiding or minimising these repeated calls in cases where the accumulate function is expensive?

    P.S. I've managed to achieve this by adding a 'flag' fact to the LHS of the rule that uses the accumulate function. Are there any other techniques?

    ReplyDelete