To Be Continuous Podcast: Staging Servers Must Die

Welcome to To Be Continuous, a show about continuous delivery and software development hosted by Paul Biggar, founder of CircleCI, and Edith Harbaugh, CEO of LaunchDarkly. In this episode, Edith and Paul discuss a blog post by Edith, in which she asserts that you should kill your staging servers so that continuous delivery can live.

This podcast is brought to you by Heavybit, a program that helps developer-focused companies take their product to market.

[soundcloud url=”https://api.soundcloud.com/tracks/248544923?secret_token=s-fUn5T” params=”auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false&visual=false” width=”100%” height=”166″ iframe=”true” /]

Paul: So today we’re gonna talk about a post that Edith wrote. The post is about the death of staging servers. Can you give us a little bit of an introduction to what the post is, because I think we’re gonna talk about it quite a bit.

Edith: Sure Paul, happy to. By the way, did you read it yourself?

Paul: I confess, I may have skimmed it, a little more than read it.

Edith: Yeah, so I’ll give you the summary. And then we’ll talk about it. And then you should go read it, and then we’ll talk about it again.

So the article’s an outgrowth from what I was seeing with LaunchDarkly, my company. So already one of the comments I saw on it is, and I like this comment because they talked about me in the third person. They said, “Harbaugh is biased.”

Paul: Right, I would say so.

Edith: But by the way, she does have a good point. So I’ll state upfront, yes, I’m very biased. My company makes a platform to manage feature flags. So my bias, though, is I’ve seen what our customers are doing.

And it’s the reason I wrote the article, is that this was a national extension to the way I saw our customers using feature flagging.

Paul: So were your customers using your product wrong, or was there a thing that was happening, that you’re talking about?

Edith: So feature flagging at it’s most basic is very simple. It’s just an if-then statement. You don’t need LaunchDarkly or any sort of system to manage that, you can just put a conditional in your code. What happens after that is you get more sophisticated. And you wanna have some sort of dashboard where you could see the different feature flags that are visible and uplevel them. Not even just to the business side, but even amongst the various developers.

You want to have a central place where you can manage. That was kind of our first version, is just you can manage feature flags.

The next thing that our customers were asking for in our road map was to support for environments. They wanted to be able to have feature flags on dev, QA, staging production, have visibility all in one place.

Paul: I’m a little confused when you say this, because surely an environment is just a flag?

Edith: Yes, go on.

Paul: It seems like you could say, have another flag that is a string, that is like staging or production or Dev or whatever. That you flag features on.

Edith: And that’s basically what LaunchDarkly is doing, so for us, we just have another flag.

So each environment gets an API key.

Paul: So again, forgive ignorance on this, why not use the existing flag?

What was missing from the existing feature flag infrastructure that meant environments had to be a separate top level, or first class feature?

Edith: So now we’re getting a little meta. The reason why people liked using LaunchDarkly is because they can get a consolidated view of all their feature flags.

So that they could not just have these floating around fig file for each machine, but have a roll up of what flags are turned on and off for different environments. And then on top of that the ability to manage the flags in different environments.

Paul: Got you, so they wanted a view that applied to just their environments?

Edith: Well, to see per environment what was turned on and off. So basically they were following a software life cycle, where you had dev people working on the developer boxes, pushing a QA, pushing a staging, and then pushing to productions.

Paul: When features moved from one to the other, was there a human promoting these, or was there an API call promoting these?

Edith: At this point it’s mainly human. I could see down the line that it would just be an API. And even beyond that … The next thing people ask for, once they got more people on board LaunchDarkly, we’re using more feature flags, is they wanted the ability to have lockdown of different flags in different environments.

Paul: Lockdown like people can’t change the flags?

Edith: So they wanted QA to have rights, and QA to change some flags, but not in production.

Paul: I see, okay yeah.

Edith: Then after that the natural step then is okay, if you have a feature flag in system where you can control any step, who sees permissions?

Who gets to see what? Why do you really have a separate QA staging and production box? Why not just collapse all this, and manage visibility with a feature flag itself?

Paul: Sure. That makes perfect sense.

All the feature flags provide the primitives on which you can build the things that they want.

Edith: The point I made in the article is that

people use the substraction of a QA and a staging environment to basically try to encapsulate a change. So that was the original intent, and at the time it was very good, because the alternative was just to push everything to production to have everything break.

But if you’re actually doing—

Paul: If you’re doing feature flags properly, then the concept of really having a staging server doesn’t make that much sense.

Because you’re not pushing a feature to the staging server, you’re pushing a feature to production and then slow rolling it to people and really having it in the staging server doesn’t make that much sense. Is that kind of the point?

Edith: Yeah, so that was the title of the article, Kill the Staging Server.

Paul: This might be a good point to actually stop and read the article.

Edith: Yeah, I’d appreciate that. Thanks, Paul.

Paul: We’re gonna take a, what’s gonna appear to be a 10-second break, but it’s actually going to be a five-minute break, while I read the article. I recommend you press pause and read the article yourselves right now.

Edith: So welcome back, Paul has been furiously whiteboarding. I’d love to hear his thoughts now.

Paul: Yeah, so it’s a good article, I really liked it. The first thing that came to mind was the concept of names.

So the idea of having multiple environments is a weird one in particular. So imagine that you have … Service rented architecture, and it’s got like six machines or something like that, or it’s got 20 machines or whatever. The idea that you can’t spin up a new instance of it is kind of a little bit odd.

Edith: Yeah.

Paul: So what people have done historically, and this goes back to when things ran on machines, or particular ports, or whatever, was that everything had a name. It had DNS name or had a staging name.

And if you look in rails, configuration files, there’s a production name, there’s a staging name, there’s a dev name, there’s different environment variables for all of these. But it doesn’t actually make sense to have a name. Because a name implies there is one of them.

Edith: Well this goes back, are you going down the whole pets vs. cattle thing?

Paul: I’m going down the pets vs. cattle thing, yeah. If you’re naming your pets, then you can’t just suddenly get six of them, you can’t suddenly kill them. But what you really want is cattle.

So a staging server is not, I mean a staging server is a pet. And a very very important pet that everybody loves, and everyone plays with, and it gets a little bit confused as a result of it. And that analogy went—

Edith: I’m not sure if you’re agreeing with me, or disagreeing with me, so. We’ll continue, it’d be great if you disagreed, but I would also enjoy it if you agreed.

Paul: If that analogy holds, then what you’re suggesting is that we killed the cherished family pet.

Edith: Well, you know, sometimes … aw, I can’t even go there.

Paul: Well the analogy breaks down, but let’s say it’s for the best if that pet spends some time on the farm.

Edith: Went for a long walk.

Paul: Right. So obviously production is an actual thing that needs a name, it is a unique environment. But nothing else is really unique environment.

Edith: Yeah, and to fast forward, I think, I got a lot of feedback on the article, which I loved. That’s kind of why I wrote it, for feedback. I think there are cases where you do want to not go directly from a developer to production.

I think there are many cases where you want to have other places to test them.

Paul: I do agree with this, yes.

Edith: I think a lot of the cases though, of a forced march of we go from this step to this step to this step to production is actually very harmful, when if you just pushed off much quicker and perhaps skipped some of these steps, you’d get the feedback you want directly.

Paul: So what we did at CircleCI, and initially we had a staging environment. And we would occasionally use the staging environment if we want to test something that wasn’t that easy to write unitests for, or something along those lines. And usually for us that was stuff around LXC, or stuff around starting Amazon boxes.

Things that you didn’t really do that often, and that either had an expense, or had a weird architectural thing, where you couldn’t just do it in software, and practice it in software. So I think there is that need. Occasionally you’ll have things where it’s not tested well enough, so you need to put it up somewhere where a human validates to the best of their knowledge that it actually works.

And you can put that in your, I was gonna say you could run it on, but the whole point of the staging server in that situation is that it’s something which isn’t really that easy to put into VMs, or whatever. The other kind of use case that you see is where you don’t want to be running stuff on live production databases, and you can’t get a copy too quickly. So you see people like Haroku trying to build products that avoid the need for that.

So with Haroku … You can take a copy of a database, you can have a view or a read only view on a database, that’s a copy on the right in some way or whatever. So that you don’t need a separate staging environment, or separate copy or separate staging database or whatever.

But if everything is well-tested, if everything works in software, then there’s really no need for staging environments. So that says to me that staging environments is a sort of a yellow flag somewhere. It shouldn’t really exist, but sometimes you might need it.

Edith: I actually want to write a follow up article now, of the staging server is dead, long live the staging server. Because I do think that there are used cases where you don’t want to push to prod. I do however think people—

Paul: What are these cases where you don’t want to push to prod?

Edith: So what I heard from Sean Burns, our advisor, is if you’re testing a really deep infrastructure change, for example switching some batch processing.

Paul: Mm, no, I don’t believe that for a second.

Edith: Alright.

Paul: So if you’re testing a really deep infrastructural change, there’s two ways to do that. That deep infrastructural change. One of them is to say, “We’re gonna have these machines over here, we’re gonna have these machines over here.” And that’s the quarterly release cycle thing. We’re taking a big big risk, all hands on deck, whatever.

The way that you want to be releasing that sort of thing is that you wanna have it in the code base. You wanna have an if statement, a feature flag, that controls how much of the data goes one way or the other, and you duplicate the data, or you put 1 percent of it through.

Edith: I think the key there is to duplicate or have some fail safe. I mean the biggest risk you run with doing this is data loss, which is awful. If you’re cavalier about how you do this—

Paul: Well, I don’t think data loss is the worst thing, the worst thing is your whole damn service goes down.

Edith: Well, the worst thing is your whole service goes down, and you lose a case worth of somebody’s data.

I think people are actually more forgiving of a five-minute flip than you losing like a lot of their analytics. So his point was he had been at Flurry, was that we cannot afford to lose people’s data.

Paul: So it would be ludicrous then, in my mind, to create a whole brand new infrastructural thing where you’re gonna do some kind of overnight or immediate change, or wait for downtime change or something. If the data is all live, you can’t afford to have, we’re gonna switch over and see if it works. Regardless of how well it’s tested, like it’s ludicrous.

Edith: Yeah, so that was, now I’m coming back around to my article. I mean that was the point I was making.

People think that they are reducing risks by doing all this tested staging.

Paul: Yeah, in fact they’re increasing risk.

Edith: Yeah, you would think you’re—

Paul: Well, the whole point of continuous delivery is that by having a harsh cross over between one thing and another thing, you increase risk even though you think you’re decreasing risk.

Edith: Yeah, so Ket Beck wrote a really good article, about reversibility. So he’s at Facebook now, but he said everything at Facebook is reversible.

Paul: Interesting.

Edith: That this actually makes you much less risky, ’cause you’re like okay we make these risky changes—

Paul: But they’re reversible, so it’s, right.

Edith: Yeah, vs. the more cutover. As I called in the article waterfall deployments.

Paul: I mean I think that’s a really good way of thinking about it. It ties it to a name that everybody knows is bad.

Edith: Well it was a deliberate—

Paul: Yeah, good choice. So the agile deployments then, are the ones where it happens seamlessly, and you can go back and forth and change the requirements and whatever else.

Edith: So that was Sean’s example No. 1, was what if it’s a really risky back and change? He had some good examples from his own career.

Paul: Did he have any examples that I would agree with?

Edith: Well a priori without hearing them, I don’t know if you’re going to agree or disagree. Another thing people brought up, was it just seems very risky to people. It seems to increase risk because I think they’re used to thinking of staging server as a safe harbor.

Paul: I think there’s some semantics around the use of staging server. You very often want to have a complete copy of your environment that you can test against. So is that a staging server, or is that where you type docker up, you know new environment.

And then you run your testing on this whole brand new infrastructure that has never been touched by anything before?

Edith: Yeah, and the critique I made in the article, is I think people do that a lot. And they spend a lot of energy testing there, but these are artificial test pieces.

Thanks for listening to this episode of To Be Continuous, brought to you by Heavybit.

You can find the full episode of this podcast here. For more on Continuous Delivery and to hear other Heavybit podcasts, copy this link into your favorite podcast player. We’ll have show-specific RSS and iTunes subscription options soon. You can also follow To Be Continuous on Twitter at @continuouscast.