TTO_Grant Catalogue Grant Catalogue | Page 14

Design and Implementation of a Data Stream Management System with Complex Event Processing Capabilities ABSTRACT 2010 National Grants Computer Science The world has seen proliferation of data stream applications over the last decade. These applications include network or traffic monitoring, online trading or transaction monitoring, supply chain management with Radio Frequency Identification (RFID), health monitoring, data center automation, web click-streams, other military and civilian applications using sensor networks, and many more. All of these applications are considered to be mission-critical by related organizations and require real-time processing, so that strategic decisions can be made quickly. The analysis needs of these emerging applications substantiate inherently different design and implementation specifications compared to those for existing Database Management Systems (DBMS), which are sometimes used in an ad-hoc fashion to address the processing needs of the listed applications. An emerging system architecture called Data Stream Management System (DSMS) is better suited for this purpose. The main differences between DSMS and DBMS are mentioned below. First, DSMS run queries over unbound, fast moving and dynamic data streams usually while the data is in-memory and before the data is ever stored (persistence is actually optional in DSMS). In DBMS, ad-hoc queries are run over stationary data that is already saved into the database, which is a problematic assumption for handling data streams that are characterized as being unbound and possibly bursty. In DSMS, queries are first “registered” with the system and become Continuous Queries (CQ). CQ are relatively static while the data they process is dynamic. As a result query plan optimization is still an open research field for DSMS as it is extremely challenging to build, optimize and adapt query plans for unbound and unpredictable datasets. The nature of unpredictability comes from stream characteristics such as unknown and varying arrival rates, missing tuples, out-of-order arrivals, and ad-hoc dependence on external data. Second, semantics of a Continuous Query Language (CQL) need richer temporal and spatial clauses than its counterpart called Structured Query Languages (SQL) in DBMS due to the time-window constraints in data streams. Finally, Complex Event Processing (CEP) in real-time requires making complex joins for correlations, mixing in-flight data with data stored elsewhere (e.g. databases, data warehouses) and possibly r [