Why is “export to Excel” the most popular button in a BI tool?
A typical list of analytics activity in a large enterprise may look like this:
• Monthly data mining computation that involves running large scale neu-
ral networks on a twenty node cluster
• Filtering, joining and summarizing terabytes of data over the weekend
for Monday’s CxO dashboard
• Nightly fuzzy de-duplication and record linkage process crawling
through multiple data feeds, connecting and grouping such data
• Full-text searches against terabytes of text that require sub-second
response time
It is simply not possible to standardize on a small set of tools that gracefully
serves all these masters without running into performance issues. If we con-
strain users with enterprise standards, they start generating hundreds of feeds
out of the data warehouse to run specific workloads, mostly using Excel. We’ve
seen a large enterprise use Business Objects mainly as a data feeder to Excel.
Dependence on IT grows, self-service business intelligence remains an aspira-
tion and the proliferation of Excel worksheets permeates all levels of the orga-
nization. To enable innovation across the organization, analytics infrastruc-
ture should support a variety of front end analysis patterns and a range of
tools.
Polyglot Persistence Rather than Relational Models
James Serra defines polyglot persistence in one of his blogs as follows:
“Polyglot Persistence is a fancy term to mean that when storing data, it is best
to use multiple data storage technologies, chosen based upon the way data is
being used by individual applications or components of a single application.”
Speculative Retailers Web Application
User Sessions Financial Data Shopping Cart Recommendations
Redis RDBMS Riak Neo4J
Product Catalog Reporting Analytics User Activity Logs
MongoDB RDBMS Cassandra Cassandra
Figure 3: An Example E-commerce Application with Polyglot Persistence
24 | THE DOPPLER | SPRING 2017