Welcome to my first installment in a series where I talk about my long transition as an engineer, first from a very traditional waterfall development methodology into a more Agile process, and finally into a true DevOps organization at my current gig at JumpCloud.
Who Am I?
First, a little background on me. I’m (almost) a lifelong coder, and by that I mean I started writing my first simple programs at 9 years old, on a HP 3000 MP/E host running MP/E BASIC. I got my first computer at home at 12, a Radio Shack TRS-80 with a whopping 16K of RAM, and a cassette drive for loading and saving programs. I’ve had computers in my house since then, and probably never in my life since have I gone more than a week without being in front of a computer.
I love building software, but for a long time, I was resistant to changing how I did that in a professional environment. But, I try to be a guy who’s open to at least evaluating new stuff, even if I don’t dive into new things as a way of life. I tend to be one of those guys who does it (everything) the same way over and over until I see a way that is demonstrably better. I’m a creature of habit, but I’m also a creature of continuous improvement (as long as I can see the improvement).
In college, I did a fair amount of system administration work on UNIX systems, including BSD and some System V R4 boxes. During that stint, I learned quite a bit about how not to destroy systems by running rm -rf /etc (every system admin does that on a production box at least once, don’t they? If you haven’t, I can’t really recommend the rush that comes from recognizing that you’ve just decided how you’re going to spend your night, instead of sleeping), and how to install, backup, maintain, and fix big (but by today’s standards laughably slow) mini-computers. I’ve always maintained UNIX/Linux system administration as a skill set, and it’s always come in handy.
After that, I jumped into software development, in databases, security, and systems programming.
You get the idea: I’ve been around awhile, and I’ve done enough to be able to gauge whether a new technology or process is valuable or not. And, because I’ve done dev, operations, and DBA work, I think I can offer a solid perspective on the transition to DevOps.
The Waterfall Method, or, How Can We Fail Big?
Looking back, Waterfall as we practiced it then was comparable to building a campfire, versus the subtlety and refinement of fireworks, and Agile methodologies.
Waterfall is the place where we all started (for those of us joining the workforce in the early 90s). That was how software got built: no ifs, ands, or buts. We all “enjoyed” the many hours of building requirements with business analysts, the joint application design (JAD) sessions with clients, the weeks of first writing functional specifications, then moving to the design phase and building design specifications. In the midst of all that, iterating through many reviews with stakeholders under the watchful eye of our project manager.
Once we got to coding (on my teams, at least) the docs were treated as an afterthought for the most part. Sure we’d open them up periodically to see if we’d missed a requirement, or if there was an argument about some previous decision we’d made. But, most of the time, most of us didn’t code with a document open in one window, and code in another. Sometimes we did, but inevitably we’d hit a section that literally or figuratively says, “magic happens here” (nerdspeak for, I have no idea how that’s going to work) and then go off the rails: the specs never to be seen again.
When the coding was done, we’d package everything up, and throw it over the wall to QA. QA would start cranking through executing their test plans, doing ad hoc testing, rarely doing load testing, and eventually (through internal iteration – sounding a little like Agile?) would make our software look like the requirements spec, or at least what it was made to look like after the dev team spent a lot of hours and countless emails asking the client detailed questions about how a particular thing should work. QA would then pronounce that version good, and either we (the dev team) or an ops team would actually deploy our software to production.
Oh, and by the way, I missed another step: the deployment documentation. This file goes here, reboot the system at this point, install this package, upgrade to this version of the database or OS. All that fun stuff got written into documents, and executed by humans, over and over again. And did I mention, that had to be tested, too? And maintained? And that inevitably would result in a developer getting a call in the middle of the night, or during a weekend change window to get things back on their feet?
Oh, the joys of writing documentation that was out of date almost as soon as it was finished, the deviations from it that occurred when you were building real software to handle real world cases that you never considered during the specification phase. It was almost as if all the docs were completely useless. But, well-paid individuals spent countless hours to build them. They did look impressive, and it showed that we were doing things “professionally”, not just getting out and starting to code something, because that would be crazy.
I should probably mention at this point that I’ve never been involved in building software for applications that could have life-threatening consequences, or that are one-chance things, where you spend years building the software, and if you don’t get it right, human lives, or millions or billions of dollars can be lost. I think the extreme levels of documentation, planning, and review cycles are still necessary for these types of applications, though I think both Agile and DevOps have benefits to offer parts of these projects, more on that later. In my view, these applications are not ones that lend themselves to lots of iteration, at least not externally: “Oh, I’m sorry, captain, the feature that lets you apply landing brakes is not part of this release. The team says that’s coming in the next sprint.”
Microsoft Project: The Wrong Tool for the Job
All this work took a long time, and managers were still under the illusion that project managers could predict the future. Management would say: you have 18 months to get this project done, and we’d open Microsoft Project (as though we were about to build a building or something), and start putting in development tasks. We’d estimate based on our prior experience how long a particular task was likely to take, and schedule it all out. We’d nearly always end up reducing scope at that point to make all our work fit into that short(!) 18-month schedule. I remember spending hours in Project trying to make all the dependencies work, and then seeing the schedule get changed over and over as we changed from one task to another based on what was logical at the time, not what was on the schedule.
We’d go back to management, and say, this is what we think we can do in 18-months. The more astute project managers would give a best case, likely case, and worst case estimate, all of which would be wildly different. We’d play games like ensuring that the weakest members of the team never became part of the “critical path” (the path of tasks that was packed full from beginning to end, such that missing a deadline would delay the whole project). To try to reduce risk, we’d build risk mitigation documents that listed out all the stuff that could go wrong, and what we were going to do to fix it and get the project back on track.
Management would take our project schedule, risk mitigation plans, and high level functional spec, look it over, ask some pointed and thoughtful questions, and send us away again to improve our plan. Nearly everyone in the process was competent, professional, and thoughtful. And, in the end, everyone in the process was horribly, brutally, wrong: repeatedly.
Even if we met major milestones on schedule, it was very rare to actually hit a scheduled end date. Something new would pop up, something would go wrong, something would be harder than we thought, and these somethings would pop up more frequently the longer, larger, and more money we threw at a project. So, as the risk level increased, so did the likelihood of failure. It was almost a foregone conclusion that we’d have to cut scope before the end of the project to try to meet our original deadlines. Or, the team would end up in a death march for months on end to try to pull the schedule back. Or, we’d cut into the time set aside for QA. Or all of the above.
We learned all too well the old project management axiom that you can have any two of: time, quality, or cost. That is, you can get something fast and of high quality if you want to spend a lot of money to get it. Or, you can go cheap and fast, at the cost of quality. Or, finally, you can get great quality and low cost, if you want to take more time. That was called out especially harshly when iterations were counted in years, rather than months or weeks.
The funny thing was that we all fought with these problems daily, and we argued with management about how broken the process was, that all the specification and up front planning work (especially when building phone books or mailing lists) wasn’t useful, but in the end, we all just kind of continued down the same path, until the late 90s, when the ideas of Agile development first started taking root.
Next time, I’ll focus on Agile, my initial reactions to it, and and how it started gaining traction.