The Doppler Quarterly Spring 2019 - Page 57

The term “data gravity” refers to the growing “weight” of data being collected in an ever growing mass, the additional data and services pulled closer to that mass and the inertia which must be overcome in order to move such a mass. To be useful, data of course must be accessed (and often updated): e.g., transactions, queries, reports, analyses and data science insights. The applications used to do this usually work best when located close to the data. Why? Because the greater the network distance between data and applica- tion, the longer each interaction takes­—i.e., the greater the latency. Application perfor- mance is why we care about latency. Principles, Conflicts and Tradeoffs While technology changes constantly, certain principles about data remain consistent. We would propose the following principles as examples: • Data from more sources is better than data from fewer sources • Consolidated data is better than siloed data • Low latency (close network proximity between data and applications) is better than high latency (greater network distance between data and applications) • The more knowledge you have of your data assets, the better • No single data structure can optimally serve all consumption patterns These principles guide our data architectures. Under some circumstances, they also reveal tradeoffs that must be considered. Why? Because core data principles can con- flict with each other. As an example, the first principle (data from more sources is better than data from fewer sources), and the second (consolidated data is better than siloed data), are both generally accepted as true, yet in architectural terms they represent a potential conflict. If all your data is consolidated into a single location and structure, it cannot at the same time be stored in different places and forms to serve all your dispa- rate applications and consumption use cases. SPRING 2019 | THE DOPPLER | 55