The Doppler Quarterly Spring 2017 | Page 29

Hybrid Cloud Even for organizations that have wholeheartedly committed to adopting the cloud, we recommend a hybrid approach to start. The first step is to clearly segment the current system into a set of well-defined workloads mapped to specific constituents. It is not advisable to move all workloads to the cloud in one phase, even if you are considering simple “lift and shift” operations. At CTP, our Cloud Adoption Program provides a prescriptive roadmap that details how to take your workloads to the cloud systematically. We recommend initially staying deliberately hybrid as you learn, educate and manage the change. Enterprise Data Lake Our earlier discussion of schema on read (wit h multiple analytics engines imposing a schema of their choice at the time of reading the data) naturally leads to the concept of building an enterprise data lake. This is a place to col- lect and store enterprise data, both structured and unstructured, without worrying about further structuring it in some fashion. The enterprise data lake is typically built on a Hadoop Distributed File System (HDFS) that enables parallel and distributed computation on massive data sets, and scales with the growth of the enterprise and its data assets. HDInsights Microsoft Azure Data Lake Figure 4: Microsoft Azure Data Lake Ephemeral Clusters When migrating from large on-premises clusters with big MPP machines to a cloud-based infrastructure, we should not think about long-running, always-on clusters, unless we absolutely need them. For most advanced usages of enter- prise data, especially data science related workloads, we are only interested in the end results of the analysis. Cloud offers ease and the associated cost savings by allowing you to automatically start up a massive cluster, compute the result set and shut down after the job is done. The result set can be consumed by reporting or dashboarding tools for further analysis or executive reporting. SPRING 2017 | THE DOPPLER | 27