The ingredients vary from company to company. While
smaller companies may pull most of their data out of
CMDBs, larger enterprises can have a multitude of data
sources. Even if data is coming from a small data set – a
server list, for example – there can be a large number of
applications and databases hosted on those servers, result-
ing in a rich and complex information set.
To gather the right ingredients, scanning tools should run
for an extended period of time (around two months), so
they can capture infrequent business processes. Along the
way, they will capture significant amounts of time series
data on application and OS processes, network connections
and a myriad of other events happening in an enterprise
information system.
Plan Your Processes
Before we get to tooling selection, it makes sense to look at
how to prepare the data for consumption. There are several
steps. Just like a chef does not just throw raw ingredients
into a pot and turn on a burner, a team needs to proceed
carefully and strategically to create a quality output. Here
are a few steps to follow.
• Data collection and normalization. Collect the input
data and make it consumable. Put it in a format that
is acceptable to analytics tools.
• Data manipulation and aggregation. This is where
you massage the data, add statistical functions, pro-
cess time series data and then summarize it into
totals, counters, averages and other statistical aggre-
gates. You can later run queries to make sense out of
the aggregated data. This is the intermediate pro-
cessing of the data.
• Apply analysis. This is where you create queries to
correlate between different data sets and apply ana-
lytical techniques to gain insights. This helps us plan
the migration.
• Refresh and repeat. Environments do not stay
static. Data center environments are dynamic. Serv-
ers will be provisioned or decommissioned, applica-
tions will be deployed and undeployed, and networks
will change – all while migration is being planned or
even being conducted. The updated data sets will need
to be reingested and reprocessed to update the results.
Put Your Tools to Use
Now let’s get to the cooking! What tools shall we use?
Since you are migrating into the cloud, why not take advan-
tage of the rich set of data analytics services and tools from
the leading cloud providers? Why reinvent the wheel?
For data prep and data processing, cloud services such as
AWS Glue and EMR, GCP Cloud Dataprep and Cloud Datap-
roc or Azure Databricks do the job. You may not need a sig-
nificant amount of data prep and processing. It will depend
on whether you get well-normalized data in the first place,
and how relatable the data sets are. One area that is partic-
ularly important to prepare is time series data, which may
be difficult to query. You may have to preprocess this data
and create some aggregations out of the raw input.
Quite a few tools perform analysis and visualization. Some
data processing tools have data analysis capabilities, while
others are more specifically designed for data analysis.
Amazon Athena can be used to query data directly from
storage while applying a just-in-time schema to it. If you
have minimal requirements for preprocessing, you can skip
that step altogether and use Athena to query your unstruc-
tured data from S3.
You can visualize your assets and their relationships by
using analytics services such as Amazon QuickSight, GCP
Cloud Datalab or Azure Power BI. These tools can generate
graphs and charts, draw relationships, illuminate patterns
and generate insights into your data.
Produce Quality Outputs
Now we are getting to the good stuff – the finished product.
The meal a migration team prepares produces a desired
array of query results, reports and diagrams dissecting
application and server data.
Here are some of the more interesting outputs that will help
with migration planning:
• Asset groupings. What are the clusters of assets
with a large number of interdependencies?
• Communication patterns. What are the protocols
that represent these dependencies?
• Shared infrastructure. Which applications share
servers?
18 | THE DOPPLER |
SPRING 2019