Wednesday, August 04, 2010

Drools Grid (version 2) – #1 Modules Introduction

Hi there, I'm right now commiting/merging into the JBoss Drools trunk (5.2.0.SNAPSHOT) the new version of the Drools Grid module. The idea of this module and all its submodules is to provide the ability to execute distributed knowledge session across distributed grid of machines/nodes.

For achieving this big goal we can set up different components that will allow us to transparently distribute our knowledge session based on the requirements that we have for our applications.
In this post I will give a quick overview about each of these components and in the next few post I will be trying to show how we can use this project in real life scenarios.

Remember that this is a work in progress, so community feedback is appreciated!

Inside the drools-grid directory you will find the following sub modules:

Drools Grid API (drools-grid-api - Low level API)

This module contains all the low level APIs to interact with nodes across the grid. You will find here core concepts that will be used in the grid internals to define different types of services.
Some of the core interfaces that you will find here are:

ExecutionNodeService: this interface will represent across the grid Nodes that will be able to host and execute knowledge sessions.
DirectoryNodeService: this interface will represent across the grid Nodes that will be in charge of hosting a directory with information about what’s living inside the grid. Inside these nodes we can find all the ExecutionNodeServices and HumanTaskNodeServices that we have currently running inside our distributed nodes and also the knowledge sessions that we have running inside them.
HumanTaskNodeService: this interface will represent a HumanTaskNodeService that will be in charge of hosting and executing human tasks for business processes. (Work in progress, so expect changes)

(note: In the future expect to see more of these interfaces representing new type of services running inside the grid.)

These services will be distributed running in different places and we will use a simple API to be able to be connected with these services in order to use them. For handling these connections to different services we have a class called GridConnection. This GridConnection class will let us add new connectors to our different services. We can add new ExecutionNodes, DirectoryNodes or HumanTaskNodes connectors to a GridConnection. Based on these connectors when we ask for a specific service (executionNode, directoryNode or humanTaskNode) the GridConnection will choose one of the registered connectors and it will give us one connection to the service. In the case that we want to create a new knowledge session, we need to request for an ExectuionNode to the GridConnection. This will take one of the available connectors and it will create an ExecutionNode (client) for you to start using it. The executionNode internally will contain a set of low level services, that based on the connector type will be configured to provide an execution environment that will run locally, remotely or in a real distributed environment.

As you can imagine, these interfaces needs to be implemented in order to provide the functionality. That's why we have different modules that provides different implementations for these services.
It’s also important to note, that this APIs are extended for each particular type of environment. You will find two extensions right now: drools-grid-remote-api and drools-grid-distributed-api. Both will contain a set of specific classes and interfaces that extends the core functionality provided by the project drools-grid-api.

Let's take a look at the different different environment types and the sub-modules that we need to use in each of them.

Local Environments

This is a pretty straight forward environment. This environment will let us execute Drools in the way we already are used to. The only difference with the common Drools APIs is that we will use the Drools Grid APIs that will give us the power to move our application to a different type of environment in the future.

Drools Grid Local Impl (drools-grid-local):

This module provides a Local implementation of the previously described services. With Local I mean, in the same JVM instance. This implementation behaves in the same way that if we were using the common Drools APIs. The idea behind this implementation is to provide the ability to run Drools Grid locally using the same APIs that we can use in distributed environments. This will give us the possibility to move our implementations from one environment (Local) to more distributed ones (Remote or Distributed).

Inside this project you will find the local implementation of the services that will included inside the Execution Nodes and Directory Nodes. Note that we didn't include the HumanTask node here because we don't have a local implementation for the Human Task service.

Remote Environments

Remote Environments will let us run our knowledge sessions in different JVM instances distributed across a network of computers. Based on the requirements of each situations we will be able to choose the underlaying implementation that it’s used to communicate different runtimes hosted in different JVMs/Machines/Nodes.

Drools Grid Remote API (drools-grid-remote-api):

This module provides the API that needs to be implemented by Remote Environment providers. Right now the two planned implementation for these APIs will be HornetQ and Apache Mina. The idea behind this two implementation is provide the guidelines to create new and more robust implementations that suits different situations/requirements.

Drools Grid Remote Node Mina (drools-grid-remote-mina):

This module provides the implementation of the internal services required to establish a remote connection. This module can also be executed from the command line to execute a new Mina Remote Server that can host and execution remote knowledge sessions. This module provide the specific connector required by a client that wants to create remote sessions that will be hosted inside a Mina Execution Node Server.

Drools Grid Remote Directory Mina (drools-grid-remote-dir-mina):

This module provide the implementation of the internal services required to establish a connection with a remote directory service. This module can also be executed from the console to start a new directory node that will keep track of the Execution Nodes, Knowledge Sessions, Knowledge Bases and other Directory Nodes that are running inside our grid.

Distributed Environments

Distributed environments provide a more robust solution and more services around the topology of machines that we will in our network. In distributed environments we will have services that will let us automatically deploy, fork and manage all the services across the grid. We will not need to manage or start different services in different machines, we will have a full distributed environment that will be in charge of these tasks. One of the main characteristics of this kind of environments is that the environment itself will know when and how we need to create new services instances, because the demand is too high.

Drools Grid Distributed API (drools-grid-distributed-api)

This module provide some of the extensions needed for Distributed environments. It only adds some internal classes that are used for the services that will run in this kind of environments.

Drools Grid Distributed Node Rio (drools-grid-distributed-rio)

This module provides the implementation of a Rio service that will capable to host knowledge sessions. When we compile and package this module we will get a OAR (rio deployable archive), that we can distribute/deploy in a Rio environment. Take a look at this post to see how you can configure and deploy this Rio Service (I will add this soon).

Drools Grid Distribtued Directory Rio (drools-grid-distributed-dir-rio)

This module provides the implementation of a Rio service that will be capable to host information about the grid environment. It will store information related with our knowledge sessions, kbases and other services running across the grid. It’s important to note that Rio itself store and maintain low level information about the grid usage, and this information will not be part of the directory service.

Drools Grid Tasks (drools-grid-task) (Work in progress, need refactoring)

This module will be split in the following sub modules: drools-grid-task-api, drools-grid-remote-task-mina, drools-grid-remote-task-hornetQ and probably drools-grid-distributed-task-rio. Right now, the project only contains the interfaces to hook up the currently two supported implementations Apache Mina and HornetQ. But to move forward with this refactorings, we need to do first some core refactorings in drools-process/drools-process-task, to split implementations and interfaces.

Drools Grid Service (drools-grid-services)

This module brings the user the APIs to build Applications. The main idea behind this project is to provide a High Level API to abstract the low level details that are required to build a Grid Environment.
Using this module you can describe your grid topology and then use this definition in order to run your application on top of it. Inside the Drools Grid Services APIs you will have the following concepts to describe and use your Grid Topology:

GridTopology: a GridTopology will represent the topology itself. It will be composed with ExecutionEnvironments, DirectoryInstances and TaskServerInstances. You as client user, will define your topology (where are your ExecutionEnvironments, DirectoryInstances and TaskServerInstance) and then we will create a new GridTopology instance using this definition. Once we get the GridTopology object we can start using it for our applications executions.

ExecutionEnvironment: it will represent a Node/Machine that will be able to host more than one knowledge session. Inside this node/machine the ksession will run and we can interact with it remotely (or locally).
DirectoryInstance: it will represent a node that will keep track about the other nodes in the grid and it will let us register, and lookup this services and it’s contents.
TaskServerInstance: it will present a human task server node, that will be able to execute and maintain all the information about human tasks for business processes.

If you want to create an application that uses Drools Grid, this is the module that you want to use. We will be analyzing how to use this module in future posts.

In brief

Basically I’ve introduced the modules inside Drools Grid. I will be working hard in some refactorings during the next two weeks, so feedback is really appreciated. I will publish in another blog post my current TODO list, if you want to help I will be here trying to answering questions.

Stay tuned!

Original post: http://salaboy.wordpress.com/2010/08/04/drools-grid-version-2-1-modules-introduction/

5 comments:

  1. It would be nice if we could do the score calculation in Drools Planner in the cloud with Drools Grid :)

    Is it possible to distribute a DRL across the grid? For example, a DRL like this one: http://fisheye.jboss.org/browse/~raw,r=34195/JBossRules/trunk/drools-planner/drools-planner-examples/src/main/resources/org/drools/planner/examples/nurserostering/solver/nurseRosteringScoreRules.drl

    ReplyDelete
  2. Salaboy, how about reliability? Can one node take over the work of another failed node?

    ReplyDelete
  3. I am using Drools with HornetQ to load balance the traffic across the nodes which giving me the same behavior; if one node fail then the other node can take over but I like the RIO API - Good work Salaboy

    ReplyDelete
  4. Sounds interesting - I wondered what the projects in trunk represented. I am however unsure whether this is a distributed RETE network, distributed execution environments (load balancing and failover) or distributed Fusion. Can you please enlighten me?

    ReplyDelete
  5. Hi guys, sorry for the long delay on my response. (I didn't get any notification from blogger about this comments.)

    @Greoffrey: hmm good question. The idea behind Drools Grid is to manage sessions across the grid (different nodes/machines), so it changes a little bit the way that you think about distribution.

    @mantis and @Greoffrey: this is not a distributed RETE implementation. This will let you distribute and execute (different) knowledge assets in a simple way.
    I was thinking a lot about planning problems, or more specifically how to improve the grid topology using planner.

    @dienaya: yes you can do that. Remember that stateful sessions maintain status, so you can create an active-active environment easily.

    @Abdel: we have almost 80% of an implementation using hornetQ for remote interactions, so you will get the same features that you have right now in your implementation using drools-grid-services.

    @all: you can check my personal blog for more up-to-date posts. I'm planning to put all the links here when I can complete a set of related posts that I'm working on.

    ReplyDelete