|Thesis title:||Combining Expressiveness and Efficiency in a Complex Event Processing Middleware|
|Research area:||Advanced software architectures and methodologies|
Several complex systems operate by observing a set of primitive events that happen in the external environments, interpreting and combining them to identify higher level composite events, and finally sending the notifications about these events to the components in charge of reacting to them, thus determining the overall system’s behavior. Examples of systems that operate this way are sensor networks for environmental monitoring, financial applications, fraud detection tools, and RFID-based inventory management. More in general, the information system of every complex company can and should be organized around an event-based core that realizes a sort of nervous system to guide and control the operation of the other sub-systems. The task of identifying composite events from primitive ones is performed by the Complex Event Processing (CEP) Engine. It operates by interpreting a set of event definition rules that describe how composite events are defined from primitive ones. The CEP engine is usually part of a CEP system or middleware which also handles the communication with local and remote clients. To capture all the requirements of the aforementioned applications, a CEP engine has to face several challenges. First, it has to provide a suitable language for rule specification, explicitly tailored to model complex temporal relationships that join together primitive events in composite ones. Second, it has to implement efficient processing algorithms, to detect composite events and deliver notifications to interested parties with the lowest possible delay. Finally, it has to support distributed scenarios, in which the communication parties may be deployed over a wide geographical area. This thesis first proposes a modelling framework to compare and analyze not only existing CEP systems, but all the systems developed with the aim of processing continuous flows of information according to pre-deployed processing rules. This allows us to identify the main advantages and limitations of existing approaches, by looking at a wide range of proposals. Moreover, our modelling framework draws a common ground for comparing efforts coming from different research communities, with different background, expertise, and vocabulary. We believe that our work can bridge the gap between different worlds, promoting the communication and reducing the effort required to compare and merge the results produced so far. Moving from the issues identified while analyzing existing works, we introduce T-Rex, a new CEP system explicitly designed to combine expressiveness and efficiency. In particular, we first present TESLA, the new event definition language used by T-Rex. TESLA is explicitly designed to model in an easy and natural way the complex relationships that join primitive events and the actions required to aggregate them to obtain composite events. Then we discuss in details the implementation of T-Rex, studied to efficiently process TESLA rules. First of all we focus on the problem of matching, i.e., selecting relevant (primitive) events based on their content, which is one of the fundamental actions present in every event-based system. We propose a novel matching algorithm explicitly designed to take advantage of parallel hardware, including modern Graphical Processing Units (GPUs). This is the first solution that analyzed the adoption of parallel hardware to speed up matching and our evaluation shows impressive results with respect to existing sequential solutions. Afterward, we focus on complete TESLA rules, and we discuss and compare two different processing algorithms that take two opposite approaches to process incoming events. A comparison with existing products shows the effectiveness of both our proposals and the differences among them. Independently from the adopted algorithm, T-Rex leverages the presence of multiple processing cores to efficiently evaluate different rules in parallel. To further reduce the time required to handle the most complex rules, i.e., those involving a large number of primitive events, we present and evaluate a third algorithm to process TESLA rules on GPUs. Our contribution goes beyond the implementation of T-Rex, indeed this is the first work that describes in details how CEP can leverage off-the-shelf parallel hardware: multi-core CPUs and GPUs. Since our analysis is organized around the basic language constructs provided by TESLA, but also present in most of existing CEP languages, our work represents an important contribution to determine how CEP can take advantage of currently available parallel hardware architectures and which processing algorithms are best suited to exploit their processing power. The last aspect examined by this thesis is how to take advantage of the availability of multiple processing nodes, distributing the processing load over different machines, to better support large-scale distributed scenarios, reducing the delay required to receive results, or the occupation of network resources. To this extent, we present and compare different solutions for a distributed T-Rex, extracting the advantages and limitations of each of them. They include the protocols to organize available nodes into an overlay network, to partition and distribute event definition rules, and to cooperatively handle event processing and delivery.