Like many developers, I started my career in a waterfall-based development environment. I was at a very large telecom, working on a product that had been around literally for decades. We’d spend weeks or months in the requirements phase before any code was written. We’d throw the code over the wall after our development period, and QA would file defects that would take weeks to prioritize and triage. It took about a year to ship a new version of this product. All progress came at a crawl, and I felt so disconnected from the customers that I would forget they even existed — the work I did was to get the project manager off my back, not to please users.
Moving to a startup that practiced Agile development was a relief. Far less mature product, smart people contributing at every level, rapid development — and a rapid feedback loop, too. It felt so natural and right that I forgot there was ever another way of building software. We talked to real customers and heard about how they wanted the product to work. Our priorities were based on actual need, with estimates based in reality and with a focus on iterative improvements to functionality. My teams designed and implemented features and had it tested and accepted quickly… software development was thrilling again.
However, it wasn’t perfect. While we could rapidly build features, and get those features right, we still had the problem of getting those features to customers rapidly. We’d design and implement and test great functionality in a matter of days, but that great stuff would sit on the shelf for weeks or even months before anyone would actually use it.
The problem was this: we feared disrupting a working production environment. Every time we’d push to working systems, we’d bring in the operations team and the senior developers on a weekend, sit tensely while the new code was fired up, and then spend hours fixing things that had somehow crept in. Sometimes interactions between different new features hadn’t been properly vetted. Months of changes had accumulated so the changes were huge, and deployment was a nightmare. And because it was a nightmare, we didn’t want to do it often… so releases got bigger and bigger over time. It’s a vicious circle.
I’m sure you know where this is headed — the answer to this problem is continuous deployment. Over time, we reduced our multi-month release cycles to weekly and finally daily, releases. The things that were most painful, we do more often — addressing the problems and automating the process. By releasing more often and in smaller chunks, we were able to clearly know what had changed if a problem occurred, and with that isolation of code changes, we started to fix problems much faster. Classic continuous deployment story.
It wasn’t easy, of course. We struggled, for instance, with distributing the necessary keys to remote systems automatically — otherwise each deployment (and individual machine affected by that deployment) required hands-on interaction. We also wrestled with getting the ability to roll-back to prepare for the inevitable “bad release”. We tried redeploying the code from a tagged branch in the repository, but it was just too slow and led to several minutes of downtime. Ultimately we ended up duplicating almost the entire infrastructure in order to build a “Blue-Green” environment, letting us swap back to the last known good state as easily as re-pointing our load-balancer.
In making that transition — in bringing that deployment pain forward – our development team needed to do a lot of tasks that were traditionally operational. We set up servers that mirrored our production environment. We learned that they needed to be reproducible, so we worked with the operations experts to make servers that could be spun up from scripts and effortlessly deployed to any environment. We felt the pain of manual installs and updates of our product, so we simplified it, hardened it, automated it – we did everything we could to make that part of our life simple. We learned firsthand the ugliness of needing to manually make changes to point to the correct database and roll the logs via cron jobs, so we made these things configurable. We uncovered the engineering assumptions and design decisions that made our product difficult to maintain and manage in production. We organically evolved into a DevOps organization. We learned to manage the servers and take on operational tasks, and we improved our process in order to make those tasks simpler.
In retrospect, it was natural – just like the transition to Agile from waterfall. To me, this is the way software should be created. Agile allows us to quickly discover and develop the products that our customers really want. DevOps extends that concept, and teaches us how to take that product and get it into the hands of our customers more quickly, and how to maintain and support that product most effectively.
I’m thrilled to be working in DevOps and can’t imagine ever going back.