Friday, June 29, 2007

JBoss Rules/Drools jobs on Jobserve

Jobserve is a website in the UK that has 80% of the IT job advertising market; as such it's a great bell weather to get an idea of market rates and tech popularity. For the first time JBoss Rules/Drools jobs have finally started to appear in number, with good rates too. Simply type "jboss rules" or drools into the search menu to see the items. This is a great sign that Drools is starting to get mainstream adoption.

Here is one such advert (hmm hope I'm not in copyright violation by printing this), note rates are in GBP:

My major blue chip client currently is currently looking for a Java Architect/Senior Designer with JBoss Rules/Drools experience for a short term piece of work in the Thames Valley. You will have an in-depth understanding of Java/J2EE and have used JBoss Rules or Drools to create business rules. This role will require you to produce high level designs, low level designs for a solution, aiding in the implementation of the solution and helping to train my client s users and development teams in the use of JBoss Rules/Drools to create business rules. Due to the nature of the position and site only individuals who are security cleared to SC level or are willing to undergo clearance can be considered for the position. If you think that you have the skills required, please send your CV to (see below) ASAP.
rate 450-700/ Day
Employment business Computer People

Golfing Example Walkthrough

Dr. Gernot Starke has created a walk through of the Golfing example which can be found here:
http://rbs.gernotstarke.de/samples/samples/golfer-riddle-jbr.html.

It's great that someone has taken up the role of documenting our examples, this is something we've purposefully held back on doing, hoping the community would take up this role - Thank you Dr Starke.

Chained 'from', 'accumulate' and 'collect'

In Drools 4.0 we introduced the 'from' keyword, this allows you to declare a source for a pattern to reason over. This allows the engine to reason over data not in the Working Memory. This could be a sub-field on a bound variable or the results of a method call; the later provides us with hibernate integration where named queries can be called and Drools will unify the pattern against the returned results. Please excuse the lack of indentation, I can't figure out how to get Blogspot to preserve the white spacing and didn't have to time put all the in by hand :(

Here is a simple example of reasoning and binding on sub-field:

Person( personAddress : address )
address : Address( zc : zipcode == "23920W") from personAddress

With all the flexibility from the new expressiveness in the Drools engine you can slice and dice this problem many ways. This is the same but shows how you can use a graph notation with the 'from':

p : Person( )
address : Address( zc : zipcode == "23920W") from p.address

Of course we can also do this our new flexible expressive language extensions:

Person( zc : address.zipCode == "2392OW")

The next example shows how we can reason over the results of a hibernate query, the Restaurant pattern will reason over and bind with each result in turn.

p : Person( )
Restaurant( food == p.favouriteFood )
from hs.getNamedQuery( "list restaurants by postcode" )
.setProperties( [ "postcode" : p.address.zipcode ] )
.list()

'collect' and 'accumulate' both result in a returned object, as such a pattern can specify either of those as its 'from' source. 'collect' allows cardinality reasoning (when there are more than 7 red buses) and returns a List object. 'accumulate' allows you to execute actions on each item of data in a set of data, matching a given pattern and execute a result action to return an object of the users choice - typically used to sum or total data, but of course can do a lot more complex work.

This example chains two 'from's together. It finds customers who have bought items all of which are priced over 10, where the items are a field and not asserted into the working memory:

c : Customer()
items : List( size == c.items.size )
from collect
( Item( price > 10 ) from c.items )

If the items where not a field, but instead asserted into the working memory, we could use a correlated 'collect' pattern:

p : Person()
list : List()
from collect( Item( owner : p ) )
items : List(size == list.size)
from collect( Item( price > 10 ) from list )

Here is how to achieve the same using 'accumulate' and its built in function 'count'; although this doesn't illustrate chained 'from', I added it for completeness:

p : Person()
count : Number()
from accumulate( i : Item( owner == p ), count( i ) )
list : List( size == count )
from collect( Item( owner == p, price > 10 ) )

For a more complex, and thus contrite but illustrative, example of chained 'from' we can look at the following example. For each store where the Person has an account we retrieve all the Items for that store and check if those items have an average price over 50

p : Person()
s : Store( accountOwner == p )
a : Number( intValue > 50 )
from accumulate( item : Item( )
from collect( Item( store == s )
from hs.getNamedQuery( "get user items in store" )
.setProperties( [ "store" : s.name, "owner" : p.name ] )
.list() ),
average( item.price ) )

So for all those that say Rete can't handle collections and nested or out of working memory data very well, I hope this shuts you up :)

NB. We would like to thank Jess for pushing the 'accumulate' CE, which is what ours is based on with the addition of accumulate functions. Likewise ILog JRules for the 'collect' CE, which is a specialised version of 'accumulate'.

-- update 29/06/2007 17.44 --
Dave Reynolds pointed me to a paper written in 1991 that described how to do collect/accumulate like extensions to Rete:
http://citeseer.ist.psu.edu/477215.html

Francois Bois pointed me to Xcerpt that describes the problem with existing languages and web based reasoning, from the site it seems our work with chained and nested 'from' along with 'collect' and 'accumulate' help satisfy much of this problem:
http://www.mulberrytech.com/Extreme/Proceedings/html/2004/Schaffert01/EML2004Schaffert01-toc.html

Wednesday, June 27, 2007

Free Session: In-the-Brain of Mark Proctor on JBoss Rules - July 11

I'm going to be at SkillsMatter on July the 11th doing a seminar on JBoss Rules 4.0. The location is near Farringdon tube station and the time is from 18.30 to 20.30, and afterwards I'll be around for drinks at the pub. So please do try and come along.

More details can be found here.

Session Details are as follows:
Talk: JBoss Rules
Speaker:Mark Proctor, JBoss Rules Lead
Date: Wednesday, 11th July 2007
Times: 18:30 - 20:30
Venue:
Skills Matter
1 Sekforde Street
London EC1R 0BE

Google Map

Registration: Register for this Free In-The-Brain Session on JBoss Rules here

Tuesday, June 26, 2007

4.0.0 MR3 Released

Bar some fixes and integration work in the Eclipse and BRMS we are near enough complete now. So the work on documentation will now start.

Release Notes
summary - http://wiki.jboss.org/wiki/Wiki.jsp?page=4.0.0.MR3ReleaseNotes
detailed - http://jira.jboss.org/jira/secure/ReleaseNote.jspa?version=12310871&styleName=Text&projectId=12310210

Downloads
http://labs.jboss.com/jbossrules/downloads

Enjoy

The Drools Team

Sunday, June 24, 2007

What is a "sequential" Rule Engine

It's very hard to find a public definition of what a "sequential" rule engine is, someone needs to update Wikipedia. My knowledge in this area is also limited so I've decided to write up what I know, in the hope that others can correct and add to this, so we can get a full public definition. Maybe the results can go up onto Wikipedia :)

Sequential engines are used in stateless execution environments; i.e. you cannot assert more data and call fireAllRules rules a second time. Modifying asserted data does not result in rule re-evaluation.

A Method is generated per rule in java code. The engine takes the asserted data and creates a list of the possible cross product combinations for the provided data and rules.

It then iterates over the rules, calling each in turn with the possible cross product matches from the list. The order is determined by salience and/or the order of rules in the file. When all 'if' statements are are true it fires straight away, there is no agenda.

If asserted data is not modified it can cache tests, which improves performance. If data is changed, you cannot cache, and further rule evaluations may not fire for the modified values.

It is the conflict set combined with data modifications that seems unclear to me, you modify data but the already checked/fired rules will never be re-evaluated for this changed data. It seems ideally sequential modes should not modify asserted data and benefit from tooling to aid in the authoring of rules so they do not result in conflicts for given sets of data, i.e. only one rule will fire.

I also don't see how sequential will be faster if caching has to be turned off due to data modifications and you have a large number of rules.

Friday, June 22, 2007

Accumulate Functions (Edson Tirelli)

As we get closer to the release, everything is taking its final shape. This week was the time to get Accumulate CE ready for the release. For those who missed, Accumulate is a very powerful new Conditional Element for Drools 4.0, released next month. It allows you to operate on sets of data.

The general syntax for it is:

ResultPattern( fieldconstraint* )
from accumulate ( SourcePattern( fieldconstraint* )
init( code )
action( code )
reverse( code )
result( code ) )

Basically what accumulate does is execute the init code block, then "iterate" over the facts that match the SourcePattern executing the action code block for each of the facts and finally executing the result code block. The given result is matched against the ResultPattern and in case it evaluates to true, the condition matches. The reverse code block is optional and its function is to improve performance when retracting or modifying facts that had previously matched the SourcePattern.

Well, nothing better than a sample to get things more clear.

Rule: apply 10% discount to orders that include at least US$ 100,00 of toys.


rule "Discount for orders that include US$100,00 of toys"
when
$o : Order()
$toysTotal : Number( doubleValue > 100 )
from accumulate( OrderItem( order == $o, type == "toy", $value : value ),
init( double total = 0; ),
action( total += $value; ),
reverse( total -= $value; ),
result( new Double( total ) ) )
then
$o.setDiscountPercentage( 10 );
end

As you can see in the example above, accumulate is really flexible and powerful. As each of the code blocks are either Java or MVEL code blocks, one can really do any kind of operation in there.

But then, someone can say: "Well, accumulate is pretty flexible and powerful, but I don't want to keep writing code for common operations like the above". So, that is something we were playing with during this week. We really want to provide you all the flexibility you need, but we also want to provide you with simplicity for the common cases. So, Accumulate Functions comes to the rescue.

You now have the possibility of using predefined functions to simplify accumulate usage for common cases. For instance, the rule above is using accumulate to perform a sum of values. The same rule can be written like that:

rule "Discount for orders that include US$100,00 of toys"
when
$o : Order()
$toysTotal : Number( doubleValue > 100 )
from accumulate( OrderItem( order == $o, type == "toy", $value : value ),
sum( $value ) )
then
$o.setDiscountPercentage( 10 );
end

Much more simple now. What if you want a rule that tells you how much it would cost you a raise of X% for each department?

rule "Total raise"
when
$dpt : Department( $raise : raise )
$total : Number()
from accumulate( Employee( dept == $dpt, $salary : salary ),
sum( $salary * $raise ) )
then
$dpt.setSalaryIncreaseValue( $total );
end

So, you can use any expression as a parameter to accumulate functions. We added most common used functions for use out of the box, like sum, average, count, min, max, etc.

But you say: "I liked the functions, but I have a custom function that I would like to provide to my users write rules with. Can I implement that function and provide to them, so that they don't need to keep writing it themselves?"

Our answer is: "Of course!" ;)

We made Accumulate Functions pluggable and as simple as we could make them, so you can easily provide new functions for your users to use. For instance, pretend you have a very complex calculation that you need to do to get the cost of a given stock trade operation. Your users are writing rules to your specialist system to enable it to advise on which operations are more profitable, and so they have several rules that need to calculate such stock trade operation costs.
To develop a new accumulate function, the only thing you need to do is create a java class that implements the AccumulateFunction interface. The interface basically has a method that correspond to each of the Accumulate operations: init, action, reverse and result. Here you can see how easy it is to implement things like the average function.

Finally, to wire your function into the system you can either call an API (addAccumulateFunction()) or define a property. The property can be defined either in a configuration file or as a system property:

Example:

drools.accumulate.function.average = org.drools.base.accumulators.AverageAccumulateFunction

As simple as that.

Hope you all enjoy.

Happy Drooling!

RuleFlow Constraint Editor and Code Completion (includes screenshots)

Code completion has just been added to the constraint editor for ruleflow 'splits' and 'joins. 'and', 'or and 'xor' type logic can be applied and the constraint used for each branch. The same constraint language is used as the left hand side (LHS) 'when' part of a rule, and that constraint monitors the Working Memory. When the ruleflow enters the 'split' or 'join' it is only true if that LHS for the Working Memory is true. This provides extremely powerful Ruleflow modelling, bring the true power of rules to workflow, in a fully integrated fashion.

Simple Ruleflow showing the Code Completion for a constraint

Code Completion showing the available fields

Code Completion showing the valid operators for the "message" field.

The end result, of course mode complex constraints can be built.

Wednesday, June 20, 2007

Production Rules implementation for the Rule Interchange Format and The Mortgage Industry Standards Maintenance Organization

(Guest Writer : Tracy Bost from valocity.com)

The Mortgage Industry Standards Maintenance Organization (MISMO) is a subsidiary of the Mortgage Banker’s Association and acts as the mortgage industry’s standards body for electronic standards. Its stated mission is “to develop, promote, and maintain voluntary electronic commerce standards for the mortgage industry”. In June of 2006, MISMO created the Business Rules Exchange Workgroup (BREW) to help organize a standard in the industry for exchanging “business rules” from one trading partner to another. The participants are volunteers from various organizations within the industry. Over the course of a few months, several participants presented use cases that were felt relevant to the workgroup and its mission.

To demonstrate to the industry the possibility of exchanging business rules was in the realm of today’s technical capabilities, and it could help meet real business needs, the group decided to perform a proof-concept (POC).

In order for the POC to be meaningful, the group decided it should be performed using an existing documented use case. Additionally, it was desired the POC should consist of at least two different vendors and platforms to add to the credibility (a reality) of the task.

Of the several use cases discussed, the “pricing rules” use case seemed to be the one most often mentioned since inception of the work group. For that reason, the group agreed to use pricing rules as it first proof-of concept.

Because the group wanted to focus on a simple subset of production rules that contained easy vocabulary references to the MISMO AUS 2.4 version. With this in mind, a subset of 17 pricing rules using a simple look up table were chosen as the actual rules to be exchanged and executed against.

Because some of the participates in the MISMO BREW workgroup were active with the W3C Rules Interchange Format (RIF) working group as well, and RIF releasing its first core draft recently, RIF was chosen as the standard language to use for the POC.

Fortunately, after approaching JBoss Rules and ILOG concerning this POC, both agreed to this challenge. Mark covers the actual implementation of it in a previous a blog entry “W3C Rule Interchange Format for Production Rule Systems”.

Background on Pricing Rules

Simply put, pricing is the interest rate a borrower will pay on the mortgage loan. Due to the complexity of a mortgage loan, several factors could affect ‘pricing”. Some of these could be the borrower’s FICO score, the type of loan, loan to value percent (LTV), whether the borrower has full stated documents(proof of income) or non-stated.

A lender or broker will typically obtain these pricing rules from a PDF downloaded from the various lenders’ web site or through email. Once the PDF file is retrieved, one can than enter the pricing rules into a business rules management system. It is easy to see this can be a time consuming and error prone process.

Having a standardized way to exchange these rules potentially save the industry millions of dollars in efficiency and effectiveness. Just as important, the customer (borrower) will be able to have a more accurate picture of what the “pricing” for a loan will be in his/her unique circumstance in more real time.

The Client Tool

My task in this POC was to build a client user interface tool that could act as a “controller” and query a rule producer (or publisher) for its rules, send a ruleset to a consumer, and send XML data to each rule engine’s web service to be executed. ILog would be the producer or publisher or the RIF ruleset, while JBoss Rules would act as the consumer. Using a MISMO compliant data payload of borrower information, the controller was to send the XML data to each ILog and JBoss rule engines for execution and get a returned modified “RequestedInterestRate” value in back, which would in essence, be the borrower’s “pricing”.

ILog came up with “apocrif” for the project name. Therefore, I decided to name my tool “APOCRIF-UI”. Since the core(written by ILog) was in Java, and JBoss Rules is a Java implementation; it only made sense the tool would be written in Java as well. Initially, I had planned on creating a web-based JSP application for this task, but it was quickly discovered a lot of testing would need to be done using the “localhost” or some other server within the tester’s private network. Asking anyone who wanted to use the tool to download and install a web application on their local machine to run it, seems to have defeat the purpose of developing a web application.

Additionally, the tool needed to provide much visual aid to the audience since BREW was planning on demonstrating this to a large crowd geared more to the business aspects of the mortgage industry, than the technicalities of rules and rule languages. A canvas with a drag-n-drop functionality seemed like a good choice for that task as well. It seemed a more traditional desktop or Rich Internet Client would fit the requirements better than a traditional web application.

After a little research, it seemed using the Standard Widget Toolkit (SWT) and deploying it as a Java Web Start application would be a good fit for our scenario. I had used the Kettle Extract, Transform, & Load (ETL) tool in the past and liked very much how the developers architected this program. With Kettle being licensed under the LGPL, I decided to us it as my model for this project.

To help save time for users of the tool, I created a pre-cooked “rules interchange job” (RIJ) and loaded into the working canvas. The RIJ is simply an XML based file containing the vendor connections, and components that are part of the project.

The RIJ had two vendor connections set up, one for the ILOG web service, and another for the JBoss Rules web service.

A component is what I gave the name to the various actions or tasks that could be dragged and dropped on the job canvas (RIJ Designer).

The components loaded onto RIJ Designer were:
  1. Three XML Payloads. These are MISMO compliant (AUS version 2.4 schema) data payloads with sample borrower information. Each file resided on my hard drive and each had different information for three fictitious borrowers.
  2. Four Agent Components. Two components for the ILOG connection (One enabled as a publisher of rules, and the other enabled as an “execution” of rules), and two for the JBoss Rules Connection. (One enabled as a consumer of rules and the other enabled as an execution engine for rules).
  3. The controller. This component (APOCRIF Controller) is what sent requests to the publisher asking for the RIF ruleset, sent the retrieved ruleset to the publisher, and sent various payloads to both rules engines for execution.
Here is a screenshot of the pre-cooked Rules Interchange Job loaded onto the RIJ Designer


To retrieve the ruleset from the ILOG Web service, I right clicked on the controller component, and selected Get? Ruleset. A dialog box appeared. I then provided my own name for the ruleset as “Pricing Rules”. Next I selected from an enabled “publisher” within the job which was the ILog Publisher component. Once it was selected, the client tool understood the vendor ILog‘s web service name “DecisionService”, and the method to retrieve a ruleset as “exportAsRIF”.

This is a screenshot of retrieving a RIF ruleset from the ILog Publisher


Once the ruleset was successfully retrieved, it created a RIF component named “Pricing Rules” on the canvas as part of the job. By clicking on the Transaction History tab, I was able to get a browser view of the returned ruleset and information on the transaction.

This is a screenshot of the transaction history tab view after the RIF ruleset was retrieved from ILog


Next, I right clicked the controller icon and selected “Send”->Ruleset. A dialog box appeared, and I selected the newly retrieved ruleset on the RIJ Designer. Then from the drop down list of consumers, I selected the JBoss Consumer component. As in the ILog connection, the tool understood what it needed to do in order to interface with the appropriate web service for the JBoss Rules vendor connection. Hence, it automatically provided the “PricingRuleService” as the web service and “updatePricingRules” method.

This is a screenshot of the Pricing Rules ruleset that was sent to the JBoss Consumer


Now that both rules engines had the same set of RIF rules, the fun began. I selected each payload on the job and first sent to ILOG Execution, and then to JBoss Execution and compared the results.

This is a screenshot of the controller preparing to send a XML payload to the JBoss Execution component

Both engines returned the same value for the modified element “RequestedInterestRate” For example, the AUSMXARM.xml payload caused each engine to return a value of 15.6% for this value.

This is a screenshot after that particular payload was sent to JBoss Rules.


Future demonstrations could include JBoss Rules modifying the ruleset and returning back to ILog for consumption, or round tripping. It is also my understanding JBoss Rules & ILog are in discussions of using a more up to data extension of the RIF core for the next demonstration. Additionally, the upcoming MISMO version 3.0 should allow for a more consistent organization of the vocabulary. This should allow more dynamic references to the data elements within the ruleset.

Nonetheless, I believe this to be the tip of the iceberg of the possibilities that could exist once business rules can be successfully communicated and shared among distributed systems.

Again, thanks to JBoss Rules & ILog for taking an innovative role in this exciting new area of sharing and exchanging business rules.

Tuesday, June 19, 2007

Declarative Relational Programming

This blog post is from a comment I made over at the InfoQ article Debate: ODBMS sometimes a better alternative to O/R Mapping?.

Object Oriented deep graph models, pojos specifically, cannot be easily reasoned over declaratively - which can create a reliance on imperative programming. With the work that W3C has done to standardise Description Logic with OWL-DL combined with declarative reasoning sytems like Drools (we'll be adding Description Logic based modelling after 4.0) you have a much more powerful metaphor for application development (although probably not framework/subs system development) - also forget OWL Full, its an academic exercise, and RDF triples are unfortunate, but luckily can just be considered a transport mechanism. Declarative relational programming obviously has a much closer 1 to 1 mapping with the database itself.

For a simple example look at the Conways Game of Life example we provide (will be updated to ruleflow soon, instead of agenda groups, which will make it more declarative). In this example we have a large NxN grid of Cell objects, the previous approach was for each Cell to have a HashSet of its surrounding Cells. The only way to calculate the number of surrounding Dead/Live cells was to imperatively iterate that HashSet for each Cell. This would create repeatedly redundant work, as we don't know what has and what hasn't change, we could track that, but again it's more imperative code and tracking we have to do. The updated Conways example uses a relational approach, no nested objects,(Although no DL yet, that is post 4.0) instead we use a Neighbour class to bi-directionally relate each surrounding cell; this means we simply declare what we want it to do to track Dead/Live cells and the system, with its understanding of the relations and what has and what hasn't changed, does the rest for us.
http://anonsvn.labs.jboss.com/labs/jbossrules/trunk/drools-examples/src/main/rules/org/drools/examples/conway/conway.drl
rule "Calculate Live"
    agenda-group "calculate"
    lock-on-active
when
    theCell: Cell(cellState == CellState.LIVE)
    Neighbor(cell == theCell, $neighbor : neighbor)
then
    $neighbor.setLiveNeighbors( $neighbor.getLiveNeighbors() + 1 );
    $neighbor.setPhase( Phase.EVALUATE );
    modify( $neighbor );
end

rule "Calculate Dead"
    agenda-group "calculate"
    lock-on-active
when
    theCell: Cell(cellState == CellState.DEAD)
    Neighbor(cell == theCell, $neighbor : neighbor )
then
    $neighbor.setLiveNeighbors( $neighbor.getLiveNeighbors() - 1 );
    $neighbor.setPhase( Phase.EVALUATE );
    modify( $neighbor );
end

I also invite you to look at the "register neighbor" set of rules, so you can see how these Neighbour relations are setup declaratively, exploiting the cross products of the column and row fields in the Cell.

While this is just a simple example using propositional logic you can exploit these relations much further, especially when working with sets of data and first order logic using things like 'collect', 'accumulate' and 'forall'. For more info see What's new in JBoss Rules 4.0 which is released mid next month.

Mark
http://markproctor.com
http://markproctor.blogspot.com

Friday, June 15, 2007

Rule Analytics - Looking at the AST (Toni Rikkola)

I spent last weekend testing JRuby. I found out that some things would be easier using JRuby some things using Java, this was of course to be expected.

First days of this week I used to design a model that would help me to test rules.

JBoss Rules forms an abstract syntax tree (AST) from the rules it gets from rule base. This AST is done using Java, so the problem is that rule engines do not support object-oriented model that well, and as I am mainly using JBoss Rules to find conflicts I need to find another way. So Michael Neale told me to think about relations like the ones used in SQL databases, using this advise I added an identifier to all my objects and information of parent objects so that the relations could be solved. After this I could test small cases that were under And or Or descriptions, but this is not enough, because I would need to form loops to check for conflicts in the entire rule. Loops would be too messy to use, so I needed an other solution.

After some brain work, I realized that if I can get a list of all of the simpler clauses that can be formed from one rule, I can use those clauses to test conflicts inside this rule. Lets look at how this looks in the Rules drl file:

rule “Rule that causes warning”
when
Foo(bar ==
"baz" && ( xyz =="123" || bar != "baz" ) )

then

# Do something
end


This rule looks for Foo objects from working memory, if object Foo has parameters that match the definitions set inside the brackets it does something.
So all the simpler clauses for definitions inside object Foo would be:
bar == “baz” && xyz == “123”
bar == “baz” && bar != “baz”

This rule has an error because obviously parameter bar can not be equal to “baz” and at the same time be unequal to “baz”. On these kind of situations the RAM could check the rules and inform the user that he or she has a rule that can be true, but has an condition that can never be true. Other warnings are for example: Foo( x > 42 || x < 42 ) this would give a warning that possibility x == 42 is not taken care of. Only problem now is that how can I form all the possible clauses from the AST.

Yesterday and today I'll be looking at how the feed back from rule checks should work. Michael Neale said that the feed back would be in XML and it could then be transformed to for example HTML. Proctor and Neale also suggested some books that could help, so last Tuesday I got Expert Systems: Principles and Programming by Joseph Giarratano and Gary D. Riley, I'll be reading that on next weekend.

Knowledge Asset Management System (KAMS)

I just read this article over at infoq which is titled "CentraSite: Registry/Repository and Free Community Edition". I'm glad that someone is getting this message out, it's something I've been pushing internally for a while now; Enterprise Government is serious and important stuff and deserves more attention.

I want people to realise that rules, processes, policies, xstl transformations, routing configurations, hibernate mapping files etc are all important pieces of enterprise knowledge, for this reason I also prefer the term "asset" to "artefact" to signify that these artefacts are assets of value. I also believe the term "Registry/Repository" under sells the idea. You aren't just creating a JNDI lookup type system for files, all stored assets are versioned along dublin core meta data and user defined classifications. These assets can be be further combined to create configurations of assets, also versioned. If we talk in terms of "rules" you create a package which is a configuration of one or more rules; a specific package configuration will use a specific version of a rule, just because a rule gets updated doesn't mean I want that version of the package configuration to use the newer version of the rule; if I want that I have to create a new version of the package configuration. This provides an important level of auditing, it means applications using this system can be audited to let us know at any point in time what policies where being applied.

Our BRMS, which we are thinking of renaming to KAMS, is a step in this direction. At the core it's just versionable, categorisable and configurable assets, of any type built ontop of Jackrabbit JCR - which is a much stronger spec than JXR which CentreSite is built on. This is in fact a seperate and self contained project, seperate from any rules or gui stuff. Ontop of this is the BRMS, which is built with GWT, currently much of this is hard coded for asset types - like rules, process etc. The plan is to eventually refactor this into a base system with extension points for different asset types.

Thursday, June 14, 2007

Drools v3to4 Update Tool (Edson Tirelli)

Those following our progress on building the Drools Version 4.0 know that together with all the features we are adding to the language, we also had to do some API and syntax changes that break backward compatibility.

Trying to minimize the impact of the upgrade from version 3.0.x to version 4.0, we would like to provide a tool to help upgrading the rule files (DRLs). It will do basic automated changes, but we expect to minimize with it any possible pain users would face when doing everything by hand.

I just committed a few lines of code of an application to do it and you can check it out from the link:

http://anonsvn.labs.jboss.com/labs/jbossrules/trunk/experimental/drools-update/

Right now, the only action implemented is to replace the calls to assert, assertLogical and modify into calls to insert, insertLogical and update, but we expect to grow it to cover any other syntax incompatibilities. For example, resolving binding types to know when to remove old calls to primitive "unwrap" method calls, like .intValue(), etc.

Any one willing to help with the task is welcome. Just join us at the IRC or mailing list.

Happy drooling.

Tuesday, June 12, 2007

API and Language changes

I've given notice several times on the mailing lists for these changes, so thought I would put the info up on the blog too, for maximum exposure. These changes are now under way.

assert will change to insert
  • Avoid the constant keyword collision with "assert", most languages are seem to support this now
  • Will change in both the drl and working memory api

modify to become update
  • Instead of workingMemory.modify(FactHandle, Object) it will be workingMemory.update(FactHandle, Object), will change modify to update in drl.
  • This method is now only used for ShadowFact objects, it's a method to let the engine know that an external object has been updated and to update it's internal cache. and reprocess.
  • Avoid keyword collision in MVEL which has nice "modify" sugar now
insertObject (assertObject), retractObject and updateObject to beome insert, retract and update
  • The Object part seems superflous, might as well remove it, especially as we start to support None Object fact types
  • drl and working memory api will now use the same method names.

Added new WorkingMemory modifyRetract and modifyAssert methods
  • Allows for non shadow fact objects.
  • When not using shadow facts (although will ofcourse work with shadow facts) you cannot call 'update', or what use to be called 'modify', because we need to know the "old" value of fields so we can retract the from the workign memory. The only safe way is to first retract the object and then assert it. However with the existing api this adds extra work and results in new fact handle. modifyRetract and modifyAssert can now be used together to "simulate" a modify on a none shadow fact object in two parts. First call modifyRetract, then change your field values, then call modifyAssert.
  • AMVEL has sugar to do: modify ( person ) { age += 1, location = "london" }, what actually happens here is it first calls modifyRetract then applies the setters and then calles modifyAssert.

Language Expressiveness: Another Take (Edson Tirelli)

As most of the users know, one of the major targets of the 4.0 release of Drools/JBoss Rules is to improve expressiveness and simplify the language.

Due to popular demand, we are glad to say that we just committed to the trunk (available in 4.0MR3) the changes to allow nested accessors, special syntax for map and array access, as well as complex expression evaluations. In other words, users can now write constraints like:


rule "Complex expressions"
when
$country : Contry( name == "Brazil" )
Person( address['business'].phone[0].contryCode == $country.phone.countryCode )
then
// do something
end

On the above rule, 'address' is a map of objects, where the key 'business' is associated with an Address POJO, that contains an array of Phone POJOs, where we are accessing the CountryCode attribute of the first object in the array.

Does that mean that users are not required to worry about modeling flatter object models anymore?
Definitively, it is not the case. We continue to say that in order to use the full rules engine's power, the users must use a relational object model. Although, we provide the above described functionality to allow for good alternatives for the case where a relational model does not fit well with the whole application design.

It is important to understand that expressions like the above are translated into inline-evals()**, an so they must be constant over time to avoid unpredictable behavior. Also, it is important to note that it is not viable to shadow entire object graphs, and again, this requires the constrained attributes to be constant while the fact is in the working memory.

** NOTE: in 4.0, we are doing a nomenclature change. What we used to call a predicate in version 3.0.x will from now on be called "inline-eval". The reason for the change is we will soon support another constructions that are known as predicates.

Happy drooling!

Drools Dev Team

[EDIT: added the note about the difference between predicate and inline-eval nomenclature change]

Saturday, June 09, 2007

Local search 101 (Geoffrey De Smet)

I am working on a local search implementation on top of Drools.
My proof of concept is already able to tackle planning problems such as:
Local search and Drools turns out to be a really nice combination because:
  • A rule engine such as Drools is great for calculating the score of a solution of a planning problem. They make it easy to add additional soft or hard constraints such as "a teacher shouldn't teach more then 7 hours a day". However it tends to be too complex to use to actually find new solutions.
  • Local search is great at finding new solutions for a planning problem, without brute-forcing every possibility. However it needs to know the score of a solution and offers no support in calculating that score.
So combining the two, I created Taseree, which will probably be accepted as Drools Solver (Local Search) once it's no longer just a proof of concept. But there's a bunch of work to be done meanwhile though.
You can find it at SourceForge for now, under the same license as Drools. As far as I know I am the first one to think of making a hybrid of local search and a rule engine.

I 'll attempt to open the black box a bit, to get some feed-back on my implementation.

I 'll start with explaining some terms used in the current implementation:
  • Solution: A state of the problem which can be evaluated into a score. The solver will try to find a solution with the highest score it can find. The initial solution's facts are asserted into the Drools WorkingMemory.
  • Score: Calculated on a solution. Higher is better. Many planning problems tend to use negative scores (the amount of constraints being broken) with an impossible perfect score of 0.
  • Move: A migration path from a solution A to a solution B. Some moves are small (move the French lesson to an hour earlier), others are large (switch all lessons on monday with those on friday). Each move has an undo move: a move from solution B back to solution A. A move notifies the Drools WorkingMemory of its changes.
  • Local search: A solver that starts from an initial solution and takes steps to evolve it. It remembers the best solution it comes across.
  • Step: The decided move. Every step taken by a local search solver is picked by the decider from a number of possible moves.
  • Decider: Decides which move is the next step. A decider uses a selector to find moves, a score calculator to calculate the score of each selected move and an accepter to decide whether or not to accept the move as the next step.
  • Selector: Figures out which moves can be made on the current solution.
  • Accepter: Accepts or rejects a move. Implementations can turn the local search into tabu search, simulated annealing, great deluge, ...
  • Tabu search: A specific local search which does not accept moves which evolve the solution into an already encountered solution. Most of the examples use tabu search.
  • Score calculator: Calculates the score for a given solution. It uses the Drools WorkingMemory.
  • Finish: Decides when it's time to give up in trying new steps to find a better solution. Implementations can check time, number of steps, feasibility of best score, ... A local search never knows that it found the perfect solution, so a finish needs to be set. However even small planning problems tend to be so complicated to solve that trying every combination (even with pruning) takes too long (TTP level 10 is a great example of that).
Now, let's take a look at some pseudo code for the local search algoritm combined with a rule engine:

workingMemory.assert(solution.getFacts());
// continue searching until our time is up
while (!finish.isFinished()) {
Move step = decider.decideNextStep() {
for (Move move : selector) {
doMove(workingMemory);
scoreCalculator.calculateScore(workingMemory);
if (accepter.calculateAcceptChance() is accepted) {
return move;
}
undoMove(workingMemory);
}
}
doStep(workingMemory);
if (stepScore > bestScore) {
// remember bestSolution and bestScore
}
}

Wednesday, June 06, 2007

Interview with Antlr 3.0 author Terence Parr

How did you got started with antlr?

The critical moment was in 1988 while working in Paris with my friend Tom Burns, who later became my business partner at jGuru.com. At the time, Tom and I were working for a robotics company--I was building a compiler, interpreter, VM, and so on for a robot control language called Karel. I built the Karel parser with a recursive descent mechanism because I didn't grok yacc and I knew how to build recursive descent parsers really well. Tom showed me a section I had forgotten in Niklaus Wirth's book: "Algorithms + Data Structures = Programs" that discussed how to represent grammars as syntax diagrams and then how to translate them to Pascal programs that would recognize the syntax. I looked at it and said, "I could do that automatically".

Fast-forward six months to my second year of graduate school at Purdue University. I started taking a course from Hank Dietz on how to build parser generators and lexer generators. As part of the course, I built "yucc" a recursive descent parser generator that was barely LL(1). ANTLR grew out of the project in that class. I started doing all of the grammar analysis necessary and really got into that stuff. Too bad I hadn't paid attention in my automata theory class as an undergraduate.

I was supposed to be studying computer architecture for my masters, but that was not very interesting to me. Since I was already working on ANTLR for fun, my friend Tom suggested I turn that into my masters thesis and get the hell out of there. ;) Ultimately, as you can see, this is all Tom's fault. That explains the mysterious dedication in the recently released ANTLR book:
http://www.pragmaticprogrammer.com/titles/tpantlr



How does antlr3 compare to antlr2 and antlr1?

ANTLR v1 was written in C and could generate C, C++ output. It introduced a number of important features such as semantic predicates and syntactic predicates. It also had a really nice AST construction facility. v1 also was the first practical parser generator that used k>1 lookahead. v2 was written (in Java) in a terrible hurry during my start of days at jGuru as sort of a side project. As a result, I had to cut corners and make a number of quick decisions. v2 had semantic predicates, but lacked the sophisticated "hoisting" mechanism from v1. v2 was pretty quirky because I did not have enough time to think about the design. v1 required the use of a separate lexer generator (called DLG, written by Will Cohen). v2 used the same recursive descent mechanism for lexers as it did for parsers, which meant that you could have recursive lexer rules. v2 had C++, C#, Python, and Java targets.

v3 is a completely rewritten version (again in Java) that I have built very carefully and slowly over the past four years. The new code base is very clean and has many many unit tests. After about 20 years of doing research in parser generation, I think I finally understand the problem. v3 represents my thoughts on the ultimate parser generator design. There are a number of things that have changed for the better in v3 including: BSD license, a brand-new very powerful extension to LL(k) called LL(*), an auto backtracking mode, partial parsing result memoization to increase the speed of backtracking, a really nice AST rewrite rule mechanism, integration of the StringTemplate template engine for generating structured text, improved error reporting and recovery, and a truly re-targetable code generator. Each target is just a set of templates--you don't have to write code to make a new target for ANTLR. (Oh, and full semantic predicate hoisting is back in).

ANTLR v3 also has the awesome ANTLRWorks grammar development environment, written by Jean Bovet:
http://www.antlr.org/works

Why swing for antlrworks and how is that working out?

We were going for maximum portability, which I believe was the right decision. At least, our experience so far is that it works pretty well across platforms. Jean Bovet is an expert GUI builder and worked extremely hard to beat Swing into submission. It is because of his skill that we were able to get Swing to work so well and across platforms. People always express shock that ANTLRWorks is written in Swing.

Why LL(k) and how does that compare to other algorithms like SLR LALR, GLR, CYK, Earley etc?


First, let's get some of the easy relationships out-of-the-way. SLR and LALR are more efficient but weaker variants of LR, which is weaker than GLR (generalized LR). GLR and Earley and CYK can deal with the same class of grammars (all context-free grammars), but GLR is more efficient. GLR is just an LR parser that forks a new LR parser to pursue ambiguous paths. When static grammar analysis reveals an non-LR decision, a GLR parser generator simply generates a special state to simply try out the various alternatives at run time. In a crude sense, a GLR parser uses LR to backtrack across the paths in non-LR parsing decisions. GLR has best case runtime O(n), but with the worst case of O(n3) just like CYK and Earley. GLR should be nearly linear for most programming language grammars.

The most exciting parsing strategy to appear recently is called packrat parsing (by wizard Bryan Ford). A packrat parser is a top-down recursive descent backtracking parser that records partial parsing results to ensure linear time complexity, at the cost of some memory. A packrat parser chooses from among alternatives purely with backtracking. Bryan Ford also defined PEGs (parser expression grammars) that have no strict ordering with GLR, because there is at least one PEG that is not GLR. Bryan extended and formalized my original syntactic predicates into these really cool PEGs. I have added his PEG technology to ANTLR via an auto backtracking feature. So, ANTLR can deal with just about any grammar you give it now and parse the associated language with linear time complexity.

There is an interesting relationship to point out: GLR is to Earley as ANTLR is to packrat. GLR is more efficient than Earley because GLR relies on traditional LR parsing for all but the non-LR decisions. Similarly, ANTLR is more efficient than a packrat parser because it uses an LL-based parser for all but the non-LL decisions. Furthermore, ANTLR's LL(*) algorithm is much stronger than the traditional LL(k), thereby reducing even further the amount of backtracking it has to do (due to fewer non-LL(*) decisions).

ANTLR's new LL(*) algorithm allows parser lookahead to roam arbitrarily far ahead looking for a token or sequence that disambiguates a decision. LL(k) can only look a fixed k symbols ahead. It is the difference between using a cyclic DFA and an acyclic DFA to make lookahead decisions.

When you put it all together, ANTLR v3 is as powerful as any other current parsing technique. However, I think you will find that it is more efficient than the others. Perhaps more importantly, ANTLR is more natural to use because it builds human-readable/debuggable recursive descent parsers and allows arbitrary actions anywhere in a grammar.

Why use a custom DSL with antlr than xml?

XML is a data format not a human interface (it's a parse tree rather than a sentence). I cannot stand typing/reading XML. Remember, being an expert in XML is like being an expert in comma separated values. Cracks me up that there are conferences on this data format. I carefully crafted my argument that "Humans should not have to grok XML" here:

http://www-128.ibm.com/developerworks/xml/library/x-sbxml.html

What style of "DSLs" or "little languages" would you recommend or NOT recomment antlr?

Well, ANTLR is great for "heavy lifting". If you know Ruby, you can probably implement a teeny little DSL very quickly because it is so flexible. I mean this not because it is a nice language to write in, but because it has optional parentheses on method calls. Consequently, you can pretty much make Ruby code look like anything you want. Cue ruby on rails, rake, etc...

As you might expect, I am pretty fast at building little languages with ANTLR, but I use sed and awk to do quick little translations. Heck, I can even do some pretty fancy footwork in Emacs for one-off translations.

What types of languages is antlr best at? Are there any that antlr is not suited for? (what are alternatives)?

I don't think there is a classification I can give you here. If you are building a language, ANTLR is a good approach unless it is so small you can do it faster in the tools I mention above. When your implementation starts to look like a handbuilt parser, though, you should step up to ANTLR for that task.

How does antlr make it easier to make "user friendly" parsers with good error reporting (versus what you would do yourself)?
ANTLR automatically generates parsers that provide excellent error messages. ANTLR gives you a "user friendly" parser for free. For example, you automatically get messages such as

line 102:34 mismatched input ';' expecting ')'

The parser will suppress any further syntax errors until the parser properly resynchronizes.
By "resynchronize", I mean that ANTLR-generated parsers also automatically recover very well from errors. If you forget a ')', the parser will pretend it saw it anyway and keep going etc...

Good error messages can even help during development of a grammar because you will have a lot of bugs among the rules. Good error messages help you track down these problems. You can easily override a method in your parser to generate detailed errors such as:

line 802:71 [program, method, stat, expr, atom] no viable alternative,
token=[@2921,802:71=';',<7>,6092:6093] (decision=5 state 0)
decision=<<35:1: atom : ( INT | '(' expr ')' );>>

Some people claim that they can do very good error reporting and recovery in hand built parsers. This is true, but the reality is that parser generators don't get tired whereas programmers do. Consequently, ANTLR will generate much better error messages than you could possibly do by hand for a large grammar. ANTLR gives you all flexibility you have in hand built recognizers. For example, you can trap recognition exceptions wherever you want just like in regular code.

How does antlr compares to javacc and other known parsers?

Yacc is still used by a frightening number of people, probably just because that is what they grew up using. Well, a lot has changed in 30 years since yacc came out. There are a number problems with yacc. Two of the biggest are: lack of good error reporting and sensitivity to action placement. It is notoriously difficult to get good error messages and error recovery using yacc (much of the problem stems from the bottom up approach of LR-based parsers).

When it comes to actions, yacc can be really frustrating. Just when you get your grammar to recognize all of your test cases, inserting a single action can cause a grammar non-determinism. This non-determinism can break your grammar and make your input tests fail. LR-based parsers can only execute actions on the right edge of an alternative, because that is the only place the parser knows precisely where it is. To allow actions at non-right edges, yacc splits the rule to get the action on the right edge of a new rule. Unfortunately, splitting the rule can introduce non-determinisms. The worst-case scenario is an action on the left edge of every alternative in an LR grammar. Due to rule splitting, the LR parser is reduced in strength to that of an LL parser.

The list goes on and on. For example, yacc has no support for building trees and has very primitive attribute handling. Yacc was great in its time, but there are many better parser generators available.

JavaCC is very similar to the previous incarnation of ANTLR, v2; choosing between them was really a matter of style as they were about as powerful and generated about the same thing with some minor differences (oh, I think JavaCC was a little faster than ANTLR v2).

With the introduction of ANTLR v3, I believe that the choice is now very clearly in ANTLR's favor as it represents a whole new generation of software (Sriram Sankar, supersmart author of JavaCC, has been too busy in industry to make a new version of JavaCC). To summarize the key advantages (which ANTLR v3 has over most of the parser generators as well):

  • LL(*) parsing technology that is much more forgiving in that it accepts many more grammars
  • ANTLRWorks grammar development environment
  • AST construction facilities via rule rewrites
  • Tree grammars that resulting tree walkers so you don't have to build them by hand
  • StringTemplate integration for emitting structure text (i.e., code generation)
  • Sophisticated semantic and syntactic predicates mechanisms
  • Auto-backtracking so you can type in any old grammar you want and have ANTLR just figured out at runtime (memoization keeps us down to linear parsing time)
  • v3 supports many different languages at this point with many others on their way; even Ada!
  • A carefully written book that makes it much easier to learn

Any chance of getting rid of antlr namespaced runtime dependency? (and inlining by default) - also, why is there a runtime?

Do you mean generating only the necessary runtime support into the output so you don't have a runtime library? If so, I think ANTLR will always have a runtime library dependency. There's just too much code to generate. It would also make it hard to fix bugs because people have to regenerate their parsers to get the fixed support code in their parsers.

How can "downstream" users like us contribute back to antlr? is everything controlled tightly through ter or will he open it up to contributors one day?

For now, I have been extremely strict about who can touch the code base. Unlike most projects in academia, none of my graduate students have touched it. ;) I do welcome bug fixes and reports, which I require people to submit through a click-wrap license. This allows me to guarantee an untainted code base, which makes big companies much more comfortable using my software. I have added a page to describe some of the projects I would like people to work on however:

http://www.antlr.org/summer-of-code/2007/index.html

What's next for antlr?

Incremental parsing is the first thing, which allows you to take any grammar and use it in an IDE. Then, AST construction from tree parsers (which is currently missing). I'm contemplating the idea of starting another book: "The ANTLR Cookbook: The taste of venison". 'course no publisher will let me get away with that subtitle, but hey it's funny. I am also going to work on a language textbook that focuses on language translation, as opposed to all of the other compiler textbooks.

This interview is available in pdf form here.

Friday, June 01, 2007

W3C Rule Interchange Format for Production Rule Systems

I've just had the pleasure of working with Ilog on a Production Rule (PR) dialect implementation for the W3C's Rule Interchange Format (RIF). The goal of RIF is to create a universal rule interchange format specification for all reasoning systems, they have focused on building a RIF Core based on Horn Rules, with dialect extensions for the different reasoning systems,

Ilog undertook the hard work of devising a simple proof of concept xml language and the marshaller to a pojo AST. The scope was kept very simple focusing on literal constraints and function calls.

A while back the MISMO group contacted us and said they wanted to attempt a proof of concept (POC) for rule standardisation within the MISMO group, they provided some real world data for us to work against. We all agreed to developed a PR RIF dialect for JRules and JBoss Rules, with the idea of demoing these two systems working off the same rule documents and payloads in a web service environment. Click image to enlarge:

It was decided that the implementation would not extend the available elements at this stage, staying with 10 elements provided by RIF Core, Click image to enlarge:

By staying with the RIF Core elements to represent a PR System you end up with a very generic and weak XML with no ability for semantic validation via a schema xsd. Let's take a look at a very simple literal constraint, in DRL this would be:
CreditScore(division == "division")

In PR RIF it's:
<Equal>
<Uniterm>
<Const type="xsd:QName">aprif:xmlGetter</Const>
<Const type="xsd:QName">ns0:CreditScore</Const>
<Const>division</Const>
<Var type="ns0:CreditScore">cs</Var>
</Uniterm>
<Const type="xsd:string">Wholesale</Const>
</Equal>

The problem is that we have no idea what that Uniterm is, it's role is contained within the element's text "aprif:xmlGetter". This means the entire XML block must be parsed and validated by PR RIF validator, instead of being able to leverage standard validation languages like Schema XSD. My preference is that we build richer higher level XML language, such as:
<literal-constraint slot="division" type="xs:string">Wholesale</literal-constraint>

This would facilitate as much XSD validation up front, the mapping to RIF core can be dictated in the spec and XSLT translators provided.

In this POC each constraint must be placed on a variable, this means all Patterns must be bound to a variable, which is obviously not needed in PR Systems - I hope that is something which is eventually resolved. The XML to show this concept is quite verbose, so I'll show what this limitation would look like in DRL:
CreditScore(division == "division")
Instead it must be:
cs : CreditScore( cs.division == "division")

Another area likely to be of contention is the ability to navigate nested structures, using a 'from' like conditional element. Personally I hope this is accepted, but I know it is not supported by PR Systems that only allow "value types" to be used as fields; i.e. strings, numbers, booleans etc. Disallowing nested structures would certainly be a step back for modern PR Systems.

Another concern was on the weakness of type, for instance we need to be able to differentiate between decimals and currency.

Those wanting to go deeper can checkout the project from subversion here:
http://anonsvn.labs.jboss.com/labs/jbossrules/contrib/apocrif/

This is an eclipse project and consists of three projects:
  • apocrif
    • The core module and handles the RIF XML marshalling.
  • jbossrules
    • The JBoss Rules implementation, works out of the box with unit tests, all jars are provded in the lib directory, the wonders of working with an open source system :)
  • jrules
    • The source code is there but won't work, or even compile, as it is missing the neccessary Ilog JRules dependencies.
In the jbossrules project there is a single test class, JBossRulesTest, that shows everything done to date, including a full integration test with the MISMO provided data:
http://anonsvn.labs.jboss.com/labs/jbossrules/contrib/apocrif/jbossrules/src/test/java/jbossrules/tests/JBossRulesTest.java

I previously blogged on the technical aspects of this implementation, including payload marshalling and nested object navigation:
Working with JBoss Rules and Web Services

For those without subversion clients I have zipped up the check and place here:
http://wiki.jboss.org/wiki/attach?page=JBossRules%2Fapocrif1.1.zip