Hybrid Cloud
Even for organizations that have wholeheartedly committed to adopting the
cloud, we recommend a hybrid approach to start. The first step is to clearly
segment the current system into a set of well-defined workloads mapped to
specific constituents. It is not advisable to move all workloads to the cloud in
one phase, even if you are considering simple “lift and shift” operations. At CTP,
our Cloud Adoption Program provides a prescriptive roadmap that details how
to take your workloads to the cloud systematically. We recommend initially
staying deliberately hybrid as you learn, educate and manage the change.
Enterprise Data Lake
Our earlier discussion of schema on read (wit h multiple analytics engines
imposing a schema of their choice at the time of reading the data) naturally
leads to the concept of building an enterprise data lake. This is a place to col-
lect and store enterprise data, both structured and unstructured, without
worrying about further structuring it in some fashion.
The enterprise data lake is typically built on a Hadoop Distributed File System
(HDFS) that enables parallel and distributed computation on massive data sets,
and scales with the growth of the enterprise and its data assets.
HDInsights
Microsoft Azure Data Lake
Figure 4: Microsoft Azure Data Lake
Ephemeral Clusters
When migrating from large on-premises clusters with big MPP machines to a
cloud-based infrastructure, we should not think about long-running, always-on
clusters, unless we absolutely need them. For most advanced usages of enter-
prise data, especially data science related workloads, we are only interested in
the end results of the analysis. Cloud offers ease and the associated cost savings
by allowing you to automatically start up a massive cluster, compute the result
set and shut down after the job is done. The result set can be consumed by
reporting or dashboarding tools for further analysis or executive reporting.
SPRING 2017 | THE DOPPLER | 27