We’ve all said it. We’ve all done it. We’ve all shook our heads at it. PEBKAC. Problem exists between keyboard and chair.
User error.
While generally applied to the end-user community – those folks who are considered technical neophytes by IT professionals – it can and should also be applied to those of us who have, at least once (admit it, come on, I know you’ve done it) fat fingered a configuration on a web server, a switch, a router, or some other network or application service.
It’s okay. We’ve all been there – head down on the keyboard, a litany of words we wouldn’t use in front of our mothers streaming from our lips between enumerations of how long we’ve sat at our desk looking for the problem. Mine was a misconfiguration of route metrics in the now long gone Network Computing lab that sent traffic from one side of the lab over a simulated T1 and back over the 100 Mbps link. When you’re trying to simulate a WAN environment and put WAN optimization solutions on either end, this is highly problematic and tends to, well, not work. Not work at all. If it wasn’t for the astute observation of my lab partner, I’d probably still be at my desk hunting for the problem.
The point is, we’ve all had our PEBKAC moments. We know they happen, and avoiding them is a very real goal of devops in general. We want consistent, repeatable and successful deployment processes in part to avoid the time and effort it takes to troubleshoot some minute configuration error in the lengthy and often complicated data path over which application traffic must traverse in our domain.
And it’s no longer just about downtime, which is costly and frustrating for everyone involved. It’s also about security; security of the network, the applications, and data.
Tufin Technologies, in a 2010 survey of 100 registered DEF CON 18 attendees, found that “73 percent of hackers came across a misconfigured network more than three quarters of the time – which, according to 76 percent of the sample, was the easiest IT resource to exploit.” [1]
With more and more layers of complexity added to the network in an attempt to simplify the network (yes, how’s THAT for irony?), there are more and more opportunities to experience the unique pain of PEBKAC every time a new service is provisioned. Consider the math:
A typical data center has over 500 servers, each running a hypervisor. Each hypervisor carries an average of 20 virtual workloads. Each workload requires at least 5 separate network attributes to be configured. That’s a total of 250,000 opportunities to make a mistake. At even an error rate of less than 1 percent, that’s still 250 problems you’ll need to track down.
This is, in part, why Devops exists. To enable the development and use of reliable processes and methods for provisioning services. That means configuration is happening under the covers, unless you’re already living in rainbow unicorn land where services configure themselves automatically. I didn’t think so.
There’s a reason why typists are measured based not just on speed but on the number of errors made while typing. It’s not just about the speed, it’s about the accuracy because eventually, you’re going to miss the mistake you made and it’s going to be sent out in a corporate wide e-mail and … well, you can peruse the Internet at your leisure to see how well those turn out. The same is true for deployments and provisioning. It’s not just about how fast you can get a service up and running, it’s about how fast you can get that service up and running correctly.
Devops should focus as much on accuracy as it does speed. You’ll save more time avoiding misconfiguration and the associated long hours of troubleshooting by simply slowing down a few MPH (or WPM) and ensuring that you’ve got it right the first time.
[1] http://www.continuitycentral.com/news05331.html, http://www.darkreading.com/perimeter/misconfigured-networks-are-easiest-prey/227200159