The Doppler Quarterly Summer 2017 - Page 21

Consumption Pattern Machine Learning Ad-hoc Analysis Reports Dashboard Processed, Standardized, Use Case Specific Data Enterprise Search Interactive Fast Queries Raw Data Data Lake Figure 4: Data Lake Layers and Consumption Patterns lake into a column store platform. Examples of tools to accomplish this would be Google BigQuery, Ama- zon Redshift or Azure SQL Data Warehouse. enterprise big data as a core asset, to extract mod- el-based insights from data, and nurture a culture of data-driven decision making. Interactive Query and Reporting There are still a large number of use cases that require support for regular SQL query tools to analyze these massive data stores. Apache Hive, Apache Presto, Amazon Athena, and Impala are all specifically devel- oped to support these use cases by creating or utiliz- ing a SQL-friendly schema on top of the raw data. EDITOR’S NOTE This is the second article in a multi-part series dis- cussing the strategic considerations and crucial Data Exploration and Machine Learning technical details that senior managers and CxOs Finally, a category of users who are among the big- gest beneficiaries of the data lake are your data sci- entists, who now can have access to enterprise-wide data, unfettered by various schemas, and who can then explore and mine the data for high-value busi- ness insights. Many data scientists tools are either based on or can work alongside Hadoop-based plat- forms that access the data lake. infrastructure modernization strategy. We share Conclusion When designed and built well, a data lake removes data silos and opens up flexible enterprise-level exploration and mining of results. The data lake is one of the most essential elements needed to harvest need to consider in an enterprise-wide analytics observations and insights we’ve developed in our role as a partner in these journeys with multiple clients. Keep current on cloud. Sign up to receive articles like this every Friday. SUMMER 2017 | THE DOPPLER | 19