The benchmarker allows you to play out different configurations against each other and it ranks the configurations, but you can easily make a scalability misjudgment: just because config A is better than config B when given 5 minutes, doesn't mean that A will be better then B when given 60 minutes. Sometimes the opposite is true.
It's not only interesting to see what best score a configuration has after the given time, but also how it got there. And that's why the benchmarker now supports outputting a graph with the best score over time.
Above we see that the green configuration wins. But less then 50 seconds (= 50 000 milliseconds) earlier, the red configuration would have won.
Not all datasets have the same curve. Some datasets are harder because they are more compact (not necessarily bigger), making it harder to move resources around. Harder dataset can separate the weak from the strong configurations more clearly.
Above we can see that the green and yellow configurations take less than half the time of the other configurations to achieve the same score.
Lies, damned lies and statistics
Now, before you start drawing conclusions from the above diagrams, take these things into consideration:
- The 4 configurations shown are almost identical and differ only in tabu type. They don't differ in the important things, such as the move factory implementations or absolutionSelection/relativeSelection size. They are already the best out of many inferior configurations.
- 400 000 milliseconds is only 7,5 minutes. Many of the algorithms don't "flatline" yet in that time. Their ROI time (return on invested time) is still high.
In an upcoming blog/article I 'll demonstrate the effect of giving Drools Planner more time and the importance of using a real-world time limitation.
Thanks to the excellent JFreeChart library for making it so easy to turn data into a graph. It's a pity though that out-of-the-box it uses yellow as the 4th line's color, making it hard to see on the default white background.