The recommended approach for exploring the Azure big data landscape on PaaS is usu-
ally to begin implementing batch data movement patterns, which in full maturity can
support a business driven use case. Companies can do this on big data platform pipe-
lines incrementally, in proof of concept (PoC) or development (DEV) environments, with
a team of dedicated developers. Figure 1 shows the two parallel flows: batch movement
and real-time.
Prepping for a Data Project
There are several prerequisites for a company setting up a big data services environ-
ment. First, you need people who understand the data stack. Having the technology in
house is a big step, but you will need access to a strong developer community or a con-
sulting service. The right components need to be mapped to the data. Having the right
level of knowledge in house will enable you to deploy applications with the automation
scripts to support ongoing integration with the data pipelines.
Second, you need big data infrastructure. Infrastructure dependencies, such as net-
worked environments, private domains and security and authentication services in a
networked environment, are traditionally managed by a dedicated infrastructure team.
In the cloud, you will need people who understand the infrastructure setup on which
these data movement pipelines are run. But as the pipeline matures, you will not need a
whole team to maintain it. The job can generally be handled by fewer cloud infrastruc-
ture experts.
Third, you need tools. Primary tools for automating big data applications have been
ARM Templates and PowerShell for cluster deployments. Azure DevOps can be the pri-
mary orchestration tool for continuous integration and delivery for all applications.
Azure DevOps enables an agile board, collaboration wiki and git repositories and build
and release management with manual and scheduled automation pipelines.
Now you are ready to run the operation. Batch movement and real-time movement
pipelines can be run independently or in tandem, giving an organization the ability to
generate insights from multiple data paths. Discussion of the steps involved and the
opportunities to be leveraged from an Azure data environment are laid out here.
Conclusion
Data has come a long way. It is no longer a hard-to-get, hard-to-process resource con-
fined to back rooms. Advanced technologies have put data in the hands of a wider array
of people, giving organizations the ability to run more projects and generate more
insights that are useful to their business, at rapid speed. Companies are not theorizing
anymore – they are gathering data and acting on it. Cloud platforms such as Microsoft
Azure are bringing data into the modern age.
24 | THE DOPPLER |
SPRING 2019