Friday, June 06, 2008

Allowing variable "masks" in DSL grammar - by Matt Geis

Revisiting some of the concepts Edson posted earlier, we decided to put ANTLR to use in how we handle DSLs.

At a high level, the logic used to parse a DSL entry was pretty straightforward. We were matching a pattern of "name=value", identifying a few groups inside the "name" block as variables (text surrounded by brackets, like "{name}"), and storing those variables for later use when the "value" block is substituted into the actual rule that needs to be expanded from a DSL into the DRL format that Drools knows how to work with.

However, we found that in certain circumstances, we'd pushed the regular expression to its limits. Its lifecycle consisted of the construction of a pattern string, followed by an optional capture group prepended to the DSL entry, and an optional capture group appended to it. Additionally, because Drools DSLs can handle regular expressions in the body of a DSL entry, we also had to allow our regular expression (and the corresponding escape codes) to identify embedded regular expressions. These pattern strings were defined in java files, so any spaces or backslashes had to be escaped as well. Add it all together, and you get a regular expression that was difficult to read, and more difficult to modify without causing an unintended ripple effect through the rest of the expression. Finally, we learned that the resulting expression was greedier than we wanted it to be. It was easy enough to write a line like

a user exists named "{name}" = User (name == "{name}")

But, if you have an object with a lot of attributes and want to support arbitrary creation of rules, you can't write a DSL expression for every possible permutation of attributes used in the LHS of your rule. Moreover, you may want to include constraints inside of a "from" or "exists" clause. To do so, you would want a DSL entry like

a user exists with {attribute} {value} = User (attribute == {value})

The pattern ended up matching not only the first constraint, but also part of subsequent constraints applied (there were DSL entries for the first constraint on any object, and for subsequent constraints), because any variable used was translated to a capture group of "(.*?)".

Edson had the idea of allowing users to define the exact nature of a variable, in other words...

a user exists with social security number "{ssn:\d{3}-\d{2}-\d{4}}" = User (ssn == "{ssn}")

Using ANTLR, we were able to parse the DSL entries and accurately isolate variable definitions, patterns within definitions, variable usage, and literal text. So now, it's straightforward to create a very strict matching such as...

[condition][]user has contact where {constraints}= u : User and exists (f: Person(where {constraints}) from u.contacts)
[condition][]where {attr:[A-Za-z0-9]+} is "{value}"={attr} == "{value}"
[condition][]and {attr:[A-Za-z0-9]+} is "{value}"=, {attr} == "{value}"

which could support a rule like

user has contact where firstName is "Edson" and country is "Brazil"

just as easily as it could support

user has contact where lastName is "Tirelli" and company is "Red Hat"

An even more user-friendly DSL can be built by the correct ordering of your DSL statements. For example, the addition of the following lines

[condition][]first name=firstName
[condition][]last name=lastName

would allow the user to rewrite the above rules as

user has contact where first name is "Edson" and country is "Brazil"

user has contact where last name is "Tirelli" and company is "Red Hat"

Take that approach, and create some simple token replacements like

[condition][]greater than = >
[condition][]less than = <
[condition][]is = ==
[condition][]where {attr:[A-Za-z0-9]+} is {value:[0-9]+}={attr} == {value}
[condition][]and {attr:[A-Za-z0-9]+} is {value:[0-9]+}=, {attr} == {value}

and a universe of flexibility opens up in front of you, as you can now accurately construct sentence fragments and phrases.

(note that attention to the ordering of such DSL entries is CRITICAL, as expansion of one DSL entry will affect the matching of subsequent entries, so if you were to match line 3 above, lines 4 and 5 would never match -- you'd have to change "is" to "==" to make them match)

With the new ANTLR model, we were able to easily support the requirement to limit matches by user-defined patterns (on a variable-by-variable basis), as well as pave the way for future enhancements to DSL usage.

Finally, now that we are liberated from the usage of a few regular expressions that capture everything, we can move ahead with other features for Drools DSLs. What started as

{variable}
and progressed to
{variable:pattern}
could be extended do something like
{variable:[attributeName,attributeValue]}
Examples of a possible attribute include a literal enumeration of allowable values for that variable, the name of a function to invoke to validate that the captured variable is allowed, the name of a function to invoke that provides a list of allowable values (a feature that could be leveraged by both the Eclipse plugin and in Guvnor).

Take the pattern matching for a test drive, let us know how it works for you (or doesn't, if that's the case), and tell us if there are any features you'd like implemented.

Caveat emptor: My rewrite of the DSL engine to use ANTLR was the result of just such a feature request. The need for tighter variable binding (basically "named capture groups") was acknowledged, and I was invited to make the necessary changes. I had no idea that it'd be so involved, or that the effort would be as rewarding as it has been.


Matt Geis