The data pipelines may look complicated, fusing together a series of steps to bring data
to its ultimate end point: the desks of business intelligence professionals. To simplify
things, data movement in the pipelines can be segmented into two categories: batch
and real-time.
Batch Movement
Batch movement is applied to large data sets, usually for mainframe processing or more
cost effective business operations. An example of batch data movement and processing
occurs when a credit card transaction is made with a merchant, and an authorization
code is sent to the merchant by the issuing bank. To settle or close, all the credit card
authorization codes received by the merchant are usually compiled at the end of the
day, and sent in batch, since it is more cost effective, to the payment processor for sort-
ing and forwarding to the issuing banks. At settlement, the issuing banks release those
funds to the processor, which deposits them into the merchant account, paying for the
transaction. The issuing bank shows it as a purchase on the credit card account holder’s
next statement. For the issuing bank, batch data movements result from asynchronous
batch processing, where payloads are published at different times, and the journey to
account statement generation results from stepwise processing, which includes refining,
enriching, formatting and joining the data.
Real-Time Movement
Companies such as Linkedin, Twitter and Facebook have event-driven business models
and systems, which address streaming real-time data, generating, processing and ana-
lyzing the needs. Every tweet, click and profile edit is a real-time update which is cap-
tured and can total more than a billion events per day, recorded as they stream (e.g.,
Kafka at LinkedIn). These streaming data platforms, developed to capture real-time
updates, when combined with historical views of the data, enable businesses to respond
to events in real time.
To give you an idea of how a PaaS data system operates and how data travels from place
to place, Figure 1 offers a look inside Microsoft Azure’s cloud-based big data movement
pattern.
The data pipelines may look complicated,
fusing together a series of steps to bring
data to its ultimate end point: the desks of
business intelligence professionals.
22 | THE DOPPLER |
SPRING 2019