Sunday, June 10, 2012

Just in time selectors in Planner

As part of the Selector rewrite for Drools Planner 5.5.0.Beta1, I am introducing the option to have the selectors create each move just in time. When scaling out, this reduces the memory footprint and increases score calculation per second.

When a selector is set to just in time, instead of creating each Move as part of a move list, the selector only creates a Move just before it's evaluated. So Moves that aren't evaluated aren't created. And after a Move is evaluated it might be forgotten immediately.

The old selectors had 2 options in random mode:
  • AbstractMoveFactory: all the Moves are created at the start of the step and shuffled.
  • CachedMoveFactory: all the Moves are created at the start of the phase. They are shuffled at the beginning of the step.
The new selectors will have 4 options in random mode:
  • cacheType = JUST_IN_TIME: a move is randomly created just before it's evaluated.
  • cacheType = STEP: all the Moves are created at the start of the step and shuffled.
  • cacheType = PHASE: all the Moves are created at the start of the phase. They are shuffled at the beginning of the step.
  • cacheType = SOLVER: all the Moves are created at the start of the solver. They are shuffled at the beginning of the step.
So let's take a look at the difference between the old CachedMoveFactory approach and the new JUST_IN_TIME cacheType approach. I ran a prototype for this locally and it's looking good.

With the old selectors (phase cached and step shuffled):


With the new selectors (configured with cacheType JUST_IN_TIME and random):


The new selector's result looks much better, doesn't it?
Especially in the case with acceptedSelection 100, it's much faster because it doesn't have to shuffle the big moveList between every 100 score calculations.
However, before you get your hopes up to much: this only results in a slight increase of the best score (in this use case in this scale) - and that's because of the catch. The real gain - I suspect - is memory consumption and GC activity. I 'll be benchmarking that later.


What's the catch?
The catch is that a just-in-time random selector does not guarantee to select distinct moves. Unlike a shuffled cached selector, which does guarantee that every Move will be selected exactly once (after which it is depleted), a just-in-time random selector might select the same Move multiple times and is never depleted. In practice, this means that some scaled down use case will prefer shuffling, so Planner will still implement that when cacheType != JUST_IN_TIME. In fact, Planner might be able to deduce the best setting automatically (based on the selector size and acceptedSelection setting).


The new code is currently masked out on master. Over the next few weekends I 'll be stabilizing this new code and increase unit test coverage. Once it's stable, I'll push the changes that replace the old code with the new code. The use of a custom MoveFactory will still be supported too.
As part of the selector rewrite, the selector configuration will be smaller and easier (with good defaults), but also more customizable and powerful (for tweaking or research).

3 comments:

  1. Hi,
    a good idea to make all moves is to add them in a "tabu" like memory before evaluating them. The list will grow up to a certain percentage and then you generate the rest of them with the old method and you then pick and use on of the new generated moves. I am not sure if this is better
    performance wise but it might work

    ReplyDelete
  2. @ComputerExplorer I believe you're describing a lazy form of "cacheType = PHASE". As far as I see, it would have the same benefits and drawbacks of cacheType phase, but it would need more orchestration to determine if a Move has already been cached. So I suspect that it's better to just use cacheType phase?

    ReplyDelete
  3. Good one mate!
    Looking forward to try it out on my large data sets.

    ReplyDelete