The growth of DevOps and containers has created some interesting split responsibility scenarios. Not that this is necessarily a bad thing, but it is something that bears watching and fixing when it becomes problematic.
Today’s topic is networking. Plenty of bits have been spewed about the situation, but it persists and is even getting more complex as architectures diversify.
Specifically, today’s topic is about the dual responsibility built into today’s networking. Generally speaking, network operations staff are still managing the complex hardware and software environments that deliver packets to your systems. What they do and how they do it is changing—but at a much slower pace than areas fully pulled into DevOps. There are many reasons for this, and I won’t delve into them here, but the truth is that managing a production network and dealing with hardware vendors bring unique responsibilities and constraints that aren’t there in a software world.
That’s experience talking, from a developer who learned network administration. Software is easier and more adaptable, and that is directly reflected in the impact of DevOps.
But we have drifted into a world in which that infrastructure is still mission-critical and still has those same constraints, while the software layer of networking—driven primarily by containers—has grown up in parallel with a different set of people often responsible for it. And that software network can be mighty complex. I shudder to think about what the iptables look like on some—maybe hundreds—of your machines.
Cloud has much the same effect, but since cloud instances often are publicly exposed, they tend to have networking people involved to make sure routing is correct and verifiable, and to help InfoSec people determine that the instance is as locked down as possible (and we can all name companies that wish this were true).
Still, the complexity of cloud spawned companies including CloudCoreo (now a part of VMware), which automate a bunch of network verification and security checks just because there are so many points that require touching. So cloud isn’t the nirvana of shared responsibility either. Yet.
As DevOps and container adoption has grown, it is natural that responsibility for container/container management networking would have spilled over into the development or application administration side. After all, once a subnet is configured for a cluster and software is managing the subnet, which is integrated with the rest of container management, it feels right to have the Dev/deploy team manage it.
And it might be. Like all things DevOps, this is a continual improvement exercise. Having someone who knows exactly what is going on between pods/clusters/networks is important, though. Breaks can cause outages that cost money. Knowledge is the key to fixing breaks and container networking is generally super-complex. Even if you are providing standard IP addresses on a corporate subnet to each container, container management networking is still complex and can still break.
So if you haven’t already, get the networking team involved in inter-/intracontainer communications. Make certain you know what is going where and what should go where.
You won’t regret it, when things inevitably break that cannot be repaired by dropping the instance and spawning a new one to replace it.