The Doppler Quarterly Summer 2017 | Page 14

TECHNICAL GUIDE How to Guide: Architecture Patterns to Consider When Designing an Enterprise Data Lake Sudi Bhattacharya and Neal Matthews This article focuses on the business value of enterprise Data Lakes, design- ing for storage, security & governance and how to utilize your big data as a core asset to extract valuable insights. at its roots. The door to previously unavailable explor- atory analysis and data mining opens up, enabling completely new possibilities. Speed “A data lake is a storage repository that holds a vast amount of raw data in its native format, including structured, semi-structured, and unstructured data. The data structure and requirements are not defined until the data is needed.” ...and a question: Why should you care? In today’s dynamic business environment, new data consumption requirements and use cases emerge extremely rapidly. By the time a requirements docu- ment is prepared reflecting requested changes to data stores or schemas, users have often moved on to a different or even contradictory set of schema changes. In contrast, the entire philosophy of a data lake revolves around being ready for an unknown use case. When the source data is in one central lake, with no single controlling structure or schema embedded within it, supporting a new additional use case can be much more straightforward. Innovation Self Service In a large enterprise, perhaps the most powerful impact of a data lake is the enablement of innovation. We have seen many multi-billion dollar organizations struggling to establish a culture of data-driven insight and innovation. They get bogged down by the struc- tural silos that isolate departmental or divisional- ly-divided data stores, and which are mirrored by massive organizational politics around data owner- ship. While far from trivial to implement, an enter- prise data lake provides the necessary foundation to clear away the enterprise-wide data access problem What is the average time between a request made to IT for a report and eventual delivery of a robust working report in your organization? In far too many cases, the answer is measured in weeks or even months. With a properly designed data lake and well-trained business community, one can truly enable self-service Business Intelligence. Allow the business people access to what- ever slice of the data they need, letting them develop the reports that they want, using any of a wide range of tools. IT becomes the custodian of the infrastructure The Business Case Let’s start with the standard definition of a data lake: 12 | THE DOPPLER | SUMMER 2017