Those of us who are old enough can remember a day when software was delivered primarily by physical media. The spread of broadband internet and smartphones has led us to the age of the web service—software hosted in the cloud accessed by user clients such as browsers and apps.
Not too long ago, web applications were run directly on physical machines in private data centers. For ease of management, these applications were usually monolithic—a single large server would contain all of the back-end code and database. Now, web hosting services like Amazon and the spread of hypervisor technology have changed all of that. Thanks to Amazon Web Services (AWS) and tools like VirtualBox, it has become easy to package an entire OS in a single file.
Using services like EC2, it has become easy to package machine images and string together sets of virtual servers. Along came the microservices paradigm—an approach to software architecture wherein large monolithic apps are broken up into smaller focused services that do one thing well. In general, this approach allows for easier scaling and feature development as bottlenecks are quicker to find and system changes easier to isolate.
Pets to Livestock
I became an infrastructure engineer right at the height of this trend. I recall building my first production environment in Amazon using a series of bash scripts. The servers were like pets to me. I gave each of them cute names. I monitored them carefully. I responded to alerts quickly and kept them healthy. I treated those instances with love and affection because it was painful to try to replace them—much like a beloved pet.
Along came Chef, a configuration management tool, and almost immediately my life got easier. With tools like Chef and Puppet, you can take away most of the manual pain associated with managing a cloud system. You can use its “environments” construct to separate development, staging, and production servers. You can use its “data bags” and “roles” to define configuration parameters and push sets of changes. Now, all of my “pet” servers had graduated from obedience school.
Then in 2013, along came Docker, and a new era began: the age of software as livestock (apologies to any vegans in the audience). The container paradigm is one of orchestration, not configuration management. Tools like Kubernetes, Docker Compose, and Marathon focus on moving around predefined images instead of adjusting config values on running instances. Infrastructure is immutable; when a container goes bad, we don’t try to fix it—we shoot it in the head and replace it. We care more about the health of the herd than individual animals. We don’t give our servers cute names anymore.
Containers make a lot of things easier. They let businesses focus more on their own special sauce. Tech teams can worry less about infrastructure and configuration management and instead worry mostly about app code. Companies can go a step further and use managed services for things like MySQL, Cassandra, Kafka, or Redis so as not to have to deal with the data layer at all. There are several startups offering “plug and play” machine learning services as well to allow companies to do sophisticated analytics without worrying about the infrastructure. These trends have culminated in the serverless model—a software architecture approach that allows teams to release software without managing a single VM or container. AWS services like S3, Lambda, Kinesis, and Dynamo make this possible. So to extend the analogy, we have gone from pets to livestock to some sort of on-demand animal service.
All of this is very cool. It is crazy that we live in a time where a twelve-year-old kid can spin up a sophisticated software system with a few clicks. We should remember that, not very long ago, this was impossible. Just a few US presidents ago, physical media was the standard and only big companies had the means to manufacture and distribute software. Bug fixes were a luxury. Now, that twelve-year-old kid can create an AWS account and make his software available to the entire world. If there’s a bug, someone will bug him on Slack and, in a few minutes, a fix is out for all users.
Very, very cool, but not without its price—reliance on cloud providers like Amazon means reliance on big corporations and proprietary technologies. If Richard Stallman and Edward Snowden haven’t made you worry about such things, the recent debacle with Facebook certainly should have.
Greater abstraction away from hardware also brings with it the risk of less transparency and control. When something breaks in a system running hundreds of containers, we have to hope that the failure bubbles up somewhere we can detect. If the problem is with the host operating system or underlying hardware, it might be hard to determine. An outage that could have been resolved in 20 minutes using VMs may take hours or days to resolve with containers if you do not have the right instrumentation.
It isn’t just failures either that we need to worry about when it comes to things like Docker. There is also the problem of security. Whatever container platform we use, we have to trust that there are no backdoors or undisclosed security vulnerabilities. Using open-source platforms is no guarantee of safety either. If we rely on third-party container images for parts of our system, we may be vulnerable.
The livestock paradigm is attractive for a number of reasons, but it is not without its downsides. Before rushing to containerize the entire stack, tech teams need to think about whether or not it is the right choice and ensure they can mitigate the negative effects.
Personally, I love working with containers. I’m excited to see where things go in the next ten years as new platforms and paradigms arise. However, as a former security consultant, I am cautious enough to know that everything comes with a price. It is up to engineers to remain vigilant to ensure that we don’t give up our autonomy as users and developers. Even the easiest CD/CI workflow in the world would not be worth the cost.
Understanding the Basics
What is a container in Docker?
A container in Docker is an isolated environment where processes are restricted to their own set of operating system resources without having to abstract the hardware layer.