The Doppler Quarterly Spring 2019 - Page 24

The data pipelines may look complicated, fusing together a series of steps to bring data to its ultimate end point: the desks of business intelligence professionals. To simplify things, data movement in the pipelines can be segmented into two categories: batch and real-time. Batch Movement Batch movement is applied to large data sets, usually for mainframe processing or more cost effective business operations. An example of batch data movement and processing occurs when a credit card transaction is made with a merchant, and an authorization code is sent to the merchant by the issuing bank. To settle or close, all the credit card authorization codes received by the merchant are usually compiled at the end of the day, and sent in batch, since it is more cost effective, to the payment processor for sort- ing and forwarding to the issuing banks. At settlement, the issuing banks release those funds to the processor, which deposits them into the merchant account, paying for the transaction. The issuing bank shows it as a purchase on the credit card account holder’s next statement. For the issuing bank, batch data movements result from asynchronous batch processing, where payloads are published at different times, and the journey to account statement generation results from stepwise processing, which includes refining, enriching, formatting and joining the data. Real-Time Movement Companies such as Linkedin, Twitter and Facebook have event-driven business models and systems, which address streaming real-time data, generating, processing and ana- lyzing the needs. Every tweet, click and profile edit is a real-time update which is cap- tured and can total more than a billion events per day, recorded as they stream (e.g., Kafka at LinkedIn). These streaming data platforms, developed to capture real-time updates, when combined with historical views of the data, enable businesses to respond to events in real time. To give you an idea of how a PaaS data system operates and how data travels from place to place, Figure 1 offers a look inside Microsoft Azure’s cloud-based big data movement pattern. The data pipelines may look complicated, fusing together a series of steps to bring data to its ultimate end point: the desks of business intelligence professionals. 22 | THE DOPPLER | SPRING 2019