The Doppler Quarterly Summer 2016 | Page 44

Google Based Data Lake
Predictive API
Data Consumers
Data Lake Data Processing Pub / Sub
Metadata
GoogleML
Data Lake Data Storage & Retrieval
Dashboards
Rules / Matching Engine
Governance Policies
Streaming Analytics
ETL Engine
Hadoop on Google Compute Engine
BigQuery
In-Memory Analytics
Search & Indexing
Big Table
Ecommerce
Data Science
BI
Mobile Apps
Data Integration
Google Cloud Storage
Figure 6 : Google Hosted Data Lake Key Google data lake technologies and capabilities include :
Operational Aspects Pub / Sub – Pub / Sub provides a seamless developer experience for the sharing of data between systems and tools .
Scalability & Performance BigQuery – BigQuery provides a highly scalable platform for analysis of data sets that are commonly read-heavy . BigQuery is a PaaS offering , ensuring low operational overhead on the IT organization .
Data Access & Retrieval Google Cloud Storage – Google Cloud Storage provides an object interface for storage of historical and archive data .
Hadoop on Google Compute Engine – Google provides multiple vendor solutions for running Hadoop on Google Compute Engine ; this can be leveraged in a data lake as a scalable batch processing environment that feeds processed , prepared data to other systems , including BigQuery .
Advanced Capabilities Google Machine Learning – Google Machine Learning capabilities provide developers the ability to leverage pre-trained models , as well as train their own for rapid analysis of data .
Predictive API – Google Predictive API provides the ability to identify patterns in data quickly , without standing up additional servers , or services .
42 | THE DOPPLER | SUMMER 2016