The Doppler Quarterly Spring 2019 - Page 25

Enterprise- Mapping, Trans- data-catalog form, Quality (EDC) (BDM/BDQ) Interactive HDinsight Query Advanced Analytic layer Business Intelligence Azure Analysis Services Structured Data Semi-Struc- tured Data Batch Movement Power BI Batch Movement SQL Ingest Data Sqoop Datalake External Data Meta-store DataWarehouse (HOT) Cosmos DB Data Sources Events ChangeLog Trigger DataWare- house (COLD) SQL DB Event Uploads Real Time Movement Bursts of Data Streaming Events Python, Scala, Spark SQL, SparkSQL, Spark ML Azure IoT Hub Event Updates Azure Databricks Azure Functions Figure 1: Azure-Based Data Movement Pipelines (Batch and Real-Time Process) Flowing Through Different Pipelines Data follows two separate and distinct pipelines depending on how it is captured. Existing processes for businesses usually follow batch movements and associated extract, transform and load (ETL) processes, to ensure that the data is cleaned and de-duped to enable on-premises capabilities and products. Batch movements can bring in bundles of structured, semi-structured and external data. Big data systems capture the variety, velocity and volume of the data that needs to be collected, processed, trans- formed and managed, to derive relevant, meaningful insights. Then there are the new streams of data that need to move through the system in bursts. Events-based, stream-based and IoT-based data capture and processing is exploding in the data ecosystem, along with the associated architectures and cloud services. Insights can be derived from live streams, interactive sessions and logs from website click- streams, and processed in real time. Cloud-native warehouses are a breed of products which are taking advantage of the decoupled storage and compute in the cloud, which delivers scalability, elasticity and cost effectiveness. The decoupled storage (e.g., S3 in AWS or Blob in Azure) can persist and grow independently, while the compute can autoscale, be paused or resumed. Cloud-native warehouses replicate across regions, providing reliability and availability. SPRING 2019 | THE DOPPLER | 23