The Doppler Quarterly Summer 2016

of tools for accessing data using SQL interfaces , tools for storing data in JSON objects , optimized platforms for read-only , as well as tools for batch processing unstructured data . These tools should be considered when designing a data lake , including the necessary interfaces for ingest and processing of data . Later in the paper we discuss specific technologies from AWS and Google for data access and retrieval . A common platform for metadata should also be designated for streamlined data access .

• Security Controls , Logging & Auditing – Security is a key element of a data lake ; the identity management , auditing and access controls should be designed to meet the risk levels of the organization , as well as compliance needs . Access controls should be consistent between access methods .

• Deployment & Automation – Tremendous operational value comes from the ability to automate deployment and recovery in the cloud . All data lake functionality should be automated for deployment and recovery , to lower the operational burden on the IT team when making changes and responding to incidents .

• Advanced Capabilities – Advanced capabilities include APIs for data analysis , or development toolkits that quickly enable teams to mock up new analysis and reports .

Figure 5 shows the recommended design pattern for a cloud-based data lake , including connectivity to traditional enterprise systems .

Data Lake

Data Lake Data Processing

Metadata

Predictive Analytics Machine Learning

Data Lake Data Storage & Retrieval

Data Consumers

Dashboards

Rules / Matching Engine

Governance Policies

Streaming Analytics

ETL Engine

Batch Processing

Analytical Reporting

In-Memory Analytics

Search & Indexing

Online Transaction

Processing

Ecommerce

Data Science

Mobile Apps

Data Integration

Object Store

Long Term Archive

Figure 5 : Data Lake Functional Architecture

SUMMER 2016 | THE DOPPLER | 39

The Doppler Quarterly Summer 2016 | Page 41