The Doppler Quarterly Fall 2017 | Page 21

Why Data Catalogs Should be the Linchpin in Your Cloud Data Strategy Joey Jablonski and Neal Matthews Effective data-driven organizations are using data catalogs to provide total visibility into available data in an easily consumable and centrally man- aged location. Data lakes have become a foundation for many orga- nizations’ data environments. While these data lakes provide new capabilities, many enterprises are strug- gling to derive full value due to the operational over- head of managing multiple new interfaces, tools, data sets and integration points. These data lakes often become “data swamps” due to the large amount of data that is ingested with no clear method to find data sets, separate them as needed and identify the core elements of value to the business. The Value of Data Catalogs Why should you care about the deployment of data catalog capability? Because while many organiza- tions now grasp the importance of centralizing their enterprise data, they often have not yet grappled with how difficult it is to efficiently and securely access that data. This difficulty arises because it is ingested from many different places, with varying amounts of structure. Data catalogs are a critical element to all data lake deployments to ensure that data sets are tracked, identifiable by business terms, governed and man- aged. Forbes contributor Dan Woods cautions orga- nizations against using tribal knowledge as a strat- egy, due to the inability to scale 1 . Data catalogs crystallize corporate data governance policies into practice, becoming the engine for enforcement and the tool for auditing of compliance. The inclusive nature of the data catalog enables it to be used for collaboration and centralized sharing of information in a known location, accessible across the organization. Data catalogs become the entry point for data scien- tists and other analytical users across the organiza- tion via the data engineers (Figure 1) who are focused on creating enriched data sets for analytical uses. Data catalogs ensure these dispersed teams can col- laborate on data set quality, usage, and business descriptions. FALL 2017 | THE DOPPLER | 19