Wednesday, January 16, 2013

Scaling construction heuristics part 1: multiple variables

For Drools Planner 6.0, I am working on highly scalable construction heuristics. Furthermore, they 'll be far more flexible and optionally configurable. This post focuses on how multiple planning variables affect scalability.

In the curriculum course use case, the planning entity Lecture has a planning variable Period and a planning variable Room. That makes 2 planning variables per entity. For example, dataset comp07 has 434 lectures, 25 periods and 20 rooms.



Suppose we use the construction heuristic First Fit Decreasing (FFD) on that. Simply explained: FFD sorts the lectures in a queue, takes the first lecture from that queue and assigns it the best remaining Period and Room. It repeats this until all lectures are assigned.

What's wrong with the old FFD implementation?


If there are multiple variables, the old implementation will, for every lecture, try a combination of every period and every room. So, for every lecture, it will try a cartesian product of the period value set and the room value set. In comp07 it tries 25 periods * 20 rooms = 500 moves for each of the 434 lectures. Suppose I have 5 variables of 1 000 values each: that would make 1 000 000 000 000 000 moves per entity!

The new implementation now also supports an alternative: do a union instead: for every lecture, try every period and assign the period, then try every room and assign the room. In comp07 it tries 25 period + 20 rooms = 45 moves for each of the 434 lectures. With 5 variables of 1 000 values each, that would make only 5 000 moves per entity.

So how does this affect score results and performance?

Benchmark cartesianProduct vs union of multiple variables


Unsurprisingly, cartesianProduct has consistently better results than union (because it investigates much more combinations):
Union breaks more hard and soft constraints. But this is an unfair comparison: union took less time. So that leaves more time for local search to optimize those constraints.

If we look at the number of milliseconds to run both algorithms, we can clearly see that cartesianProduct takes much longer than union.
Especially if we look at the scalability graph, we can see a clear scalability problem with cartesianProduct, unlike with union:
Here, there are only with 2 planning variables (period and room). More planning variables has even worse scalability.

So what can Local Search do with that extra time? Let's give both configurations 2 minutes, so local search gets whatever time is left after the construction heuristic has finished:
For this dataset scale, there's no clear winner in this case: they are competitive. If we scale out, I imagine we 'll see union as the clear winner. So if my dataset is small/medium and I have an execution time of at least 1 minute, I might go for cartesianProduct. Otherwise, I 'd prefer union. In any case, I 'd use the Drools Planner's benchmarker toolkit to help me decide that.

Interested in the benchmark's details? Take a look at the full benchmark report here.