The Doppler Quarterly Spring 2019 | Page 20

The ingredients vary from company to company. While smaller companies may pull most of their data out of CMDBs, larger enterprises can have a multitude of data sources. Even if data is coming from a small data set – a server list, for example – there can be a large number of applications and databases hosted on those servers, result- ing in a rich and complex information set. To gather the right ingredients, scanning tools should run for an extended period of time (around two months), so they can capture infrequent business processes. Along the way, they will capture significant amounts of time series data on application and OS processes, network connections and a myriad of other events happening in an enterprise information system. Plan Your Processes Before we get to tooling selection, it makes sense to look at how to prepare the data for consumption. There are several steps. Just like a chef does not just throw raw ingredients into a pot and turn on a burner, a team needs to proceed carefully and strategically to create a quality output. Here are a few steps to follow. • Data collection and normalization. Collect the input data and make it consumable. Put it in a format that is acceptable to analytics tools. • Data manipulation and aggregation. This is where you massage the data, add statistical functions, pro- cess time series data and then summarize it into totals, counters, averages and other statistical aggre- gates. You can later run queries to make sense out of the aggregated data. This is the intermediate pro- cessing of the data. • Apply analysis. This is where you create queries to correlate between different data sets and apply ana- lytical techniques to gain insights. This helps us plan the migration. • Refresh and repeat. Environments do not stay static. Data center environments are dynamic. Serv- ers will be provisioned or decommissioned, applica- tions will be deployed and undeployed, and networks will change – all while migration is being planned or even being conducted. The updated data sets will need to be reingested and reprocessed to update the results. Put Your Tools to Use Now let’s get to the cooking! What tools shall we use? Since you are migrating into the cloud, why not take advan- tage of the rich set of data analytics services and tools from the leading cloud providers? Why reinvent the wheel? For data prep and data processing, cloud services such as AWS Glue and EMR, GCP Cloud Dataprep and Cloud Datap- roc or Azure Databricks do the job. You may not need a sig- nificant amount of data prep and processing. It will depend on whether you get well-normalized data in the first place, and how relatable the data sets are. One area that is partic- ularly important to prepare is time series data, which may be difficult to query. You may have to preprocess this data and create some aggregations out of the raw input. Quite a few tools perform analysis and visualization. Some data processing tools have data analysis capabilities, while others are more specifically designed for data analysis. Amazon Athena can be used to query data directly from storage while applying a just-in-time schema to it. If you have minimal requirements for preprocessing, you can skip that step altogether and use Athena to query your unstruc- tured data from S3. You can visualize your assets and their relationships by using analytics services such as Amazon QuickSight, GCP Cloud Datalab or Azure Power BI. These tools can generate graphs and charts, draw relationships, illuminate patterns and generate insights into your data. Produce Quality Outputs Now we are getting to the good stuff – the finished product. The meal a migration team prepares produces a desired array of query results, reports and diagrams dissecting application and server data. Here are some of the more interesting outputs that will help with migration planning: • Asset groupings. What are the clusters of assets with a large number of interdependencies? • Communication patterns. What are the protocols that represent these dependencies? • Shared infrastructure. Which applications share servers? 18 | THE DOPPLER | SPRING 2019