The Doppler Quarterly Spring 2019 - Page 26

The recommended approach for exploring the Azure big data landscape on PaaS is usu- ally to begin implementing batch data movement patterns, which in full maturity can support a business driven use case. Companies can do this on big data platform pipe- lines incrementally, in proof of concept (PoC) or development (DEV) environments, with a team of dedicated developers. Figure 1 shows the two parallel flows: batch movement and real-time. Prepping for a Data Project There are several prerequisites for a company setting up a big data services environ- ment. First, you need people who understand the data stack. Having the technology in house is a big step, but you will need access to a strong developer community or a con- sulting service. The right components need to be mapped to the data. Having the right level of knowledge in house will enable you to deploy applications with the automation scripts to support ongoing integration with the data pipelines. Second, you need big data infrastructure. Infrastructure dependencies, such as net- worked environments, private domains and security and authentication services in a networked environment, are traditionally managed by a dedicated infrastructure team. In the cloud, you will need people who understand the infrastructure setup on which these data movement pipelines are run. But as the pipeline matures, you will not need a whole team to maintain it. The job can generally be handled by fewer cloud infrastruc- ture experts. Third, you need tools. Primary tools for automating big data applications have been ARM Templates and PowerShell for cluster deployments. Azure DevOps can be the pri- mary orchestration tool for continuous integration and delivery for all applications. Azure DevOps enables an agile board, collaboration wiki and git repositories and build and release management with manual and scheduled automation pipelines. Now you are ready to run the operation. Batch movement and real-time movement pipelines can be run independently or in tandem, giving an organization the ability to generate insights from multiple data paths. Discussion of the steps involved and the opportunities to be leveraged from an Azure data environment are laid out here. Conclusion Data has come a long way. It is no longer a hard-to-get, hard-to-process resource con- fined to back rooms. Advanced technologies have put data in the hands of a wider array of people, giving organizations the ability to run more projects and generate more insights that are useful to their business, at rapid speed. Companies are not theorizing anymore – they are gathering data and acting on it. Cloud platforms such as Microsoft Azure are bringing data into the modern age. 24 | THE DOPPLER | SPRING 2019