Friday, August 07, 2009

Loop detection ideas...

A common problem with writing systems in rules, is when you are mutating data (modifying working memory in any way) it is possible to have rules activate themselves over and over, and cause a non terminating loop (also known as an Infinite Loop ;).

Now we all know we can't know for sure (yet) that ANY program will terminate - but still, its troubling to have run away processes etc...

So to stop the simple case of a rule directly activating itself, we of course have the "no-loop" attribute: this instructs the engine to not let that same rule be activated as a result of its consequence. Of course this only works for simple cases: its possible for ruleA to triggle ruleB, and then back to ruleA and so on (or even more complex patterns).

In unit tests - its usually easy to spot non terminating cases, and its not that troubling as they are obvious. Buried in an app this is more troubling however.

In many cases the results of this looping is mostly harmless - you would rather it just get over itself and return something right ?!

There are a few ways to help with this issue:

1. Set a max fires value when executing - this pretty much ensures it will finish (this also assumes that you know what a reasonable upper limit is for the number of rule firings - not always the case).

2. Avoid modifying an existing fact (well - not always helpful to say this - its a bit like a doctor saying "don't go and get the flu !").

3. Cycle detection:
Well if you were to note down all the names of the rules firing, you would notice that a cycle forms (a repeating pattern) which may or may not be obvious to the eye. You could perhaps decide after a certain amount of time that you will just stop and say "enough is enough ! I think you are going in circles").

There are a few algorithms for detecting these cycles, such as the Tortoise and Hare:
(which I like because it has a cute diagram):

The basic idea is that the main phase of the algorithm is where the hare moves twice as fast as the tortoise, once they both sit on the same value, they have the potential start of a cycle/loop (and if they move in sync after, they can find out the length of a potential cycle - you can read more on the Wikipedia link above).

Now all this isn't really necessary for what we want. If we were to look the names of the rules executing in sequence, at a given point in time we can take a look at that list, and see if it ends in in a cycle/loop. Unlike the above, we aren't concerned with finding any cycle - but only one that starts at the end. If the cycle is repeated enough, then we could say it *might* be stuck in a loop (the cycle might be 1 rule firing over and over, or a sequence of rules that fire over and over...).

So a rough algorithm might be to flip the list of firings around, and anchor the "tortoise" to the start, and then look for a potential repetition, check if it is a repetition, if it is, check its length. We can then decide if the repetition carries on for long enough, to terminate (this can be done by Listener implementations).

To try this out, I wrote the following (Scala) code, and applied it as a Listener. Every 1000 or so firings I would check the log, and see if a cycle had formed. If it was, and it was a certain number of iterations (but could be variable in size) I would return:

* Take a list, return a list of the pattern that is repeated, and the total number of items that are
* detected to be in a repeated pattern. Loosely based on the tortoise and hare algorithm, only in this case
* I break the tortoise legs as I don't want him to leave the start line.
def locateLoop(list: java.util.List[String]) : (Array[String], Int) = {
var i = 1;

val endVal = inv(list,0)
var pattern = Array(endVal)
var cycleSuspected = false
var confirmed = false
var cycleY = 0
var cycleX = 0
while (i < list.size - 1 && !confirmed) {
val v = inv(list, i)
if (v == endVal) {
cycleSuspected = true
cycleX = 0
cycleY = i
var repeatingItems = Array(endVal)
while (cycleSuspected) {
cycleX = cycleX + 1
cycleY = cycleY + 1
if (cycleY < list.size) {
if (inv(list,cycleX) != inv(list,cycleY)) {
if (cycleX >= i) {
pattern = repeatingItems
// println("Terminated With Loop Found")
cycleSuspected = false
} else {
// println(list(cycleX - 1))
if (cycleX == i) {
confirmed = true
// println("Found !")
if (!confirmed) {
repeatingItems = repeatingItems ++ Array(inv(list,cycleX))
} else {
pattern = repeatingItems
cycleSuspected = false
i = i + 1

(pattern.reverse, cycleY) //this is the return value - the pattern plus size !


def inv(a: java.util.List[String], i: Int) = {
a.get(a.size - i - 1)

Certainly seemed to work well. There are many cases where a deep recursion does occur, where this might detect false positives - so its not a general solution, BUT, for many business rules scenarios (for example executing on a web server as the result of a request) it can work as a safeguard quite nicely (in other words, another optional tool).

Some other suggestions I have had were to look at the cached hashcodes of the tuples causing the activations - the idea being that if there is looping AND no data mutation - then the looping is clearly useless and should be stopped. I also tried this, but it can't really work with the hashcodes as hashcodes do NOT have to change as a result of data mutation. (Sadly, this means that deep-ish copies would need to be made of the data to watch for mutation - a cumbersome and expensive option and still not fool proof).

Good times....

1 comment:

  1. Nice post --
    With regards to Drools 4.0.7:

    workingMemory.fireAllRules(fireLimit); does work greatly, limiting the number of rule firings to the preset value.

    Now how to you get back the knowledge that the rule engine execution stopped because of this setting?
    One way is to check the AgendaSize, but then this may be > 0 due to a halt() operation.

    Any ideas?