Monday, May 29, 2017

New KIE persistence API on 7.0

This post introduce the upcoming drools and jBPM persistence api. The motivation for creating a persistence api that is to not be bound to JPA, as persistence in Drools and jBPM was until the 7.0.0 release is to allow a clean integration of alternative persistence mechanisms to JPA. While JPA is a great api it is tightly bound to a traditional RDBMS model with the drawbacks inherited from there - being hard to scale and difficult to get good performance from on ever scaling systems. With the new api we open up for integration of various general NoSQL databases as well as the creation of tightly tailor-made persistence mechanisms to achieve optimal performance and scalability.
At the time of this writing several implementations has been made - the default JPA mechanism, two generic NoSQL implementations backend by Inifinispan and MapDB which will be available as contributions, and a single tailor made NoSQL implementation discussed shortly in this post.

The changes done in the Drools and jBPM persistence mechanisms, its new features, and how it allows to build clean new implementations of persistence for KIE components is the basis for a new soon to be added MapDB integration experimental module. The existing Infinispan adaptation has been changed to accommodate to the new structure.
Because of this refactor, we can now have other implementations of persistence for KIE without depending on JPA, unless our specific persistence implementation is JPA based. It has implied, however, a set of changes:

Creation of drools-persistence-api and jbpm-persistence-api

In version 6, most of the persistence components and interfaces were only present in the JPA projects, where they had to be reused from other persistencies. We had to refactor these projects to reuse these interfaces without having the JPA dependencies added each time we did so. Here's the new set of dependencies:
<dependency>
 <groupId>org.drools</groupId>
 <artifactId>drools-persistence-api</artifactId>
 <version>7.0.0-SNAPSHOT</version>
</dependency>
<dependency>
 <groupId>org.jbpm</groupId>
 <artifactId>jbpm-persistence-api</artifactId>
 <version>7.0.0-SNAPSHOT</version>
</dependency>

The first thing to mention about the classes in this refactor is that the persistence model used by KIE components for KieSessions, WorkItems, ProcessInstances and CorrelationKeys is no longer a JPA class, but an interface. These interfaces are:
  • PersistentSession: For the JPA implementation, this interface is implemented by SessionInfo. For the upcoming MapDB implementation, MapDBSession is used.
  • PersistentWorkItem: For the JPA implementation, this interface is implemented by WorkItemInfo, and MapDBWorkItem for MapDB
  • PersistentProcessInstance: For the JPA implementation, this interface is implemented by ProcessInstanceInfo, and MapDBProcessInstance for MapDB
The important part is that, if you were using the JPA implementation and wish to continue doing so with the same classes as before. All interfaces are prepared to work with these interfaces. Which brings us to our next point

PersistenceContext, ProcessPersistenceContext and TaskPersistenceContext refactors

Interfaces of persistence contexts in version 6 were dependent on the JPA implementations of the model. In order to work with other persistence mechanisms, they had to be refactored to work with the runtime model (ProcessInstance, KieSession, and WorkItem, respectively), build the implementations locally, and be able to return the right element if requested by other components (ProcessInstanceManager, SignalManager, etc)
Also, for components like TaskPersistenceContext there were multiple dynamic HQL queries used in the task service code which wouldn’t be implementable in another persistence model. To avoid it, they were changed to use specific mechanisms more related to a Criteria. This way, the different filtering objects can be used in a different manner by other persistence mechanisms to create the queries required.

Task model refactor

The way the current task model relates tasks and content, comment, attachment and deadline objects was also dependent on the way JPA stores that information, or more precisely, the way ORMs related those types. So a refactor of the task persistence context interface was introduced to do the relation between components for us, if desired. Most of the methods are still there, and the different tables can still be used, but if we just want to use a Task to bind everything together as an object (the way a NoSQL implementation would do it) we now can. For the JPA implementation, it still relates object by ID. For other persistence mechanisms like MapDB, it justs add the sub-object to the task object, which it can fetch from internal indexes.
Another thing that was changed for the task model is that, before, we had different interfaces to represent a Task (Task, InternalTask, TaskSummary, etc) that were incompatible with each other. For JPA, this was ok, because they would represent different views of the same data.
But in general the motivation behind this mix of interfaces is to allow optimizations towards table based stores - by no means a bad thing. For non table based stores however these optimizations might not make sense. Making these interfaces compatible allows implementations where the runtime objects retrieved from the store to implement a multitude of the interfaces without breaking any runtime behavior. Making these interfaces compatible could be viewed as a first step, a further refinement would be to let these interfaces extending each other to underline the model  and make the implementations simpler
(But for other types of implementation like MapDB, where it would always be cheaper to get the Task object directly than creating a different object, we needed to be able to return a Task and make it work as a TaskSummary if the interface requested so. All interfaces now match for the same method names to allow for this.)

Extensible TimerJobFactoryManager / TimerService

On version 6, the only possible implementations of a TimerJobFactoryManager were bound in the construction by the values of theTimeJobFactoryType enum. A refactor was done to extend the existing types, to allow other types of timer job factories to be dynamically added

Creating your own persistence. The MapDB case

All these interfaces can be implemented anew to create a completely different persistence model, if desired. For MapDB, this is exactly what was done. In the case of the MapDB implementation that is still under review, there are three new modules:
  • org.kie:drools-persistence-mapdb
  • org.kie:jbpm-persistence-mapdb
  • org.kie:jbpm-human-task-mapdb
That are meant to implement all the Task model using MapDB implementation classes. Anyone with a wish to have another type of implementation for the KIE components can just follow these steps to get an implementation going:
  1. Create modules for mixing the persistence API projects with a persistence implementation mechanism dependencies
  2. Create a model implementation based on the given interfaces with all necessary configurations and annotations
  3. Create your own (Process|Task)PersistenceContext(Manager) classes, to implement how to store persistent objects
  4. Create your own managers (WorkItemManager, ProcessInstanceManager, SignalManager) and factories with all the necessary extra steps to persist your model.
  5. Create your own KieStoreServices implementation, that creates a session with the required configuration, and adding it to the classpath

You’re not alone: The MultiSupport case

MultiSupport is a Denmark based company that has used this refactor to create its own persistence implementation. They provide an archiving product that is focused on creating a O(1) archive retrieval system, and had a strong interest in getting their internal processes to work using the same persistence mechanism they used for their archives.
We worked on an implementation that allowed for an increase in the response time for large databases. Given their internal mechanism for lookup and retrieval of data, they were able to create an implementation with millions of active tasks which had virtually no degradation in response time.
In MultiSupport we have used the persistence api to create a tailored store, based on our in house storage engine - our motivation has been to provide unlimited scalability, extended search capabilities, simple distribution and a performance we struggled to achieve with the JPA implementation. We think this can be used as a showcase of just how far you can go with the new persistence api. With the current JPA implementation and a dedicated SQL server we have achieved an initial performance of less than 10 ‘start process’ operations per second, now with the upcoming release we on a single application server have a performance more than 10 fold.

Share/Bookmark