In a few short years, containers have established themselves as indispensable tools for
managing portable, stateless applications like Web servers and microservices. But they
have taken time to catch on in the world of data science where they have been viewed
as too lightweight to package and manage complex, stateful services dealing with big data.
This perception is changing. Users and vendors are starting to embrace containers and
Kubernetes, the most popular orchestration platform, as tools to facilitate deployments
of big data systems and applications. It is still early in the evolution, but experts see
Kubernetes as the foundation for a new generation of machine learning (ML), artificial
intelligence (AI), data management and distributed storage uses in cloud native envi-
ronments. Some are even saying they are paving the way for a whole new field with a
flashy new name – DataOps.
What happened? How did Kubernetes’ reputation build up so quickly in the data field?
And what has to happen for containers to really become the backbones of data science
going forward?
Kubernetes Gains Ground
Part of the story is the general market acceptance of Kubernetes. This orchestrator is
the fastest growing open source project ever in terms of sheer velocity, its aggressive
release cycle and the number of vendors and users adopting it. It is supported by all
three major cloud platforms – AWS, Microsoft Azure, Google Cloud Platform – and
WINTER 2019 | THE DOPPLER | 47