Welcome to To Be Continuous, a show about Continuous Delivery and software development hosted by Paul Biggar, Founder of CircleCI, and Edith Harbaugh, CEO of LaunchDarkly. Edith Harbaugh and Paul Biggar talk with Peter van Hardenberg about about continuous delivery in the context of organizational structure, databases and pricing.
Peter is a co-founder of the Heroku Postgres product team. For more than three years he has been responsible for designing, developing and operating the largest cluster of Postgres databases in the world. He knows how to get databases running, keep them safe and ticking fast.
This podcast is brought to you by Heavybit, a program that helps developer startups take their product to market.
Paul: So Peter, it’s traditional to ask you, what do you like best about continuous delivery?
Peter: Well, I’m increasingly old and I’ve done a lot of jobs and so I remember delivering Gameboy cartridges to the market and a Gameboy cartridge is a physical device that goes into a Gameboy, hence the name, and that process involves physical manufacturing and you get one shot and so there’s a tremendous amount of basically wasted and un-fun energy that goes into making sure you never make any mistakes.
My favorite thing about continuous delivery is the ability to make mistakes and move on with them.
Edith: This would be a great time for you to introduce yourself.
Peter: Yeah, hi, my name is Peter van Hardenberg. I’m a Heroku old-timer, I’ve been with Heroku about six years now and I co-founded the Heroku Postgres team and have built a lot of products since then.
Edith: So it’s funny you’re talking about the Gameboy because Paul and I were just talking about that.
Peter: About Gameboys.
Edith: Because I said one of the things that had fueled the rise of Zynga was continuous delivery.
Paul: That’s the first thing we found about continuous delivery that was actually bad.
Peter: So I guess they’re probably still continuous delivering somewhere. In the great startup in the sky. They’re still around, aren’t they?
Paul: They’re still around, they’re still public and,
Peter: Farmville will never go away, I’m sure.
Paul: I think they’ve moved into other lesser things.
Peter: Developer tools.
Peter: Gamified developers. Well actually, it’s how all developer tools are becoming now.
Edith: Github is gamified. How many chickens do you have, can you make a nice pattern out of them.
Paul: New watches.
Peter: It’s a bit like a game except if you’re losing the game, you lose your job.
Paul: Or you just can’t get new jobs.
Peter: Well, that’s actually probably more accurate, yeah. Yeah it’d be a weird company if they let you go because you didn’t have a long enough commit streak.
Paul: On open source projects that weren’t your job.
Peter: I like that, actually, let’s make that a game.
Paul: We were talking in the last episode, as well, about open source funding models, or the anger over lack of open source funding models.
Peter: That’s a tough problem to solve.
Paul: That would be an interesting one. You get fired if you don’t have enough contributions.
Peter: Yeah, just put it on your quarterly review, current Github streak.
Paul: Let’s put this suggestion on Hacker News and see if people agree with it. I can pretty much picture how this is going to go.
Peter: I try not to read the comments.
Edith: So what do you think is liberating about the ability to make mistakes?
Peter: Well, I mean, more important than the ability to make mistakes is the ability to try things, right, I mean we, I’m so accustomed now to being able to see in great detail because of a SaaS model, sort of the reality of what my product is in the market and then vice-versa, I can turn that around and say, well, basically live and in real time, what’s happening if we put something out there, do people care?
Does it help with adoption? Is it suddenly throwing a bunch of errors? And if there’s a problem, you can roll back very quickly. You know we run a database product and so availability is really, really important but only for a very small core of the service. People are generally pretty understanding as long as your, sort of, problems and outages are brief.
Small and, sort of, blast radius and quickly remediated and I think that they prefer to get features vs. stability, sort of, at the periphery. At the core, they need stability over features, so, you know if you’re a stripe, right, you need to keep the payment processing online but if some ancillary piece of your API has a problem that’s probably not as big of a deal.
Paul: The dash part is less important—
Paul: than the payment going through.
Peter: Well, and responsiveness and progress are so important, and I think because we’re in such a changing market, the ability to drive forward is at least as important as the ability to keep things stable but you can’t do both in the same place, basically, right?
I think continuous delivery is cool because it’s separated, sort of, a slow-moving core from a fast-moving periphery.
Paul: So when you’re creating a software product, you really want every single component of it to be continuously deliverable, which means you want to be able to migrate from the thing that you are doing now to the new thing.
And so it’s interesting, or there’s an interesting thing about databases because databases, I think, were fundamentally not built for this model.
Peter: They’re heavy.
Paul: Right. We had a problem recently where we deleted an index.
Peter: Ah, yeah.
Paul: And, whoops, that index was in use. And we thought it wasn’t in use but it was in use.
Peter: You should’ve been using Heroku Postgres, it would’ve told you.
Paul: Oh so, that’s interesting. So the thing that I wanted was to be able to, you know, partially delete that or let’s run all the queries without that and then after an hour, I can actually delete it. What does Heroku Postgres offer you?
Peter: Well so we have a number of diagnostic tools where you can see exactly how often an index is being used. In addition, next time you’re trying to delete an index, wrap that in a transaction, so say, begin and then drop the index, then do whatever and then say, rollback, and that way, you can actually test as though you didn’t have the index because Postgres MVCC model means that all that stuff is totally transactional.
Paul: So you’re, when you say do all the stuff in between, what do you mean?
Peter: Whatever queries you want it to run. Also, and we’re getting into minutia and then some baseball here but you can actually disable indexes specifically, sort of at a session level as well.
Paul: Oh, interesting. This is all Postgres or Heroku?
Peter: Yeah, that part’s just Postgres I mean, we built the tooling around it so that you can do these things, kind of, the easy way but, I’ll tell ya, if you read that 3,000-page Postgres manual, you’ll find some real gems.
Paul: Well so part of the problem might be using a thing that’s not a real database?
Peter: Oh, you’re not using a real database?
Paul: We’re not—
Peter: Oh, that’s right, I forgot.
Paul: I mean we are using a real database, we’re using many databases but that particular one was not in a real database.
Peter: Well you know, even baby databases grow up to be real databases someday.
Paul: Yeah, well, fingers crossed that that particular database that starts with an M and that we shall not discuss.
Peter: Oh, we don’t like to speak ill of anything.
Paul: No, no not at all, especially Mongo.
Edith: No, you said the word!
Peter: You said the M word! So coming back to, sort of continuous delivery of every part of the stack, yeah, it’s interesting, I mean we’ve had some interesting challenges with databases in particular and I actually gave a talk at Heavybit a while ago about, kind of, how we treat databases less as part of a service in sort of the classical sense but more like a factory.
So basically what happens is when people request databases from us, we stamp one out and then, you know, we try and keep it pretty low entropy from that point on so we have a lot of tooling and support services around the database but the actual database itself, once it’s given to a customer, we mostly try and keep our hands off it.
Now if there’s security issues or performance issues or whatever, we’ll go in and we’ll kind of warm it up and do some work on it. But even still we mostly prefer to actually replace the database. What we do is we create a replica, bring it up to speed and then we transition the load over to it. And that’s kind of a form of continuous delivery for data services, which has been surprisingly powerful as an abstraction.
Edith: So you’re basically, you’re treating them as cattle, not pets.
Oh absolutely, it’s all about the cattle and model and when you’re farming a very large crew of livestock, you can’t really afford to give each one their own personality.
Paul: This reminds me a lot of, I think, what started out as an IMVU way of doing data migrations, I’m sorry, of doing schema migrations without locking a whole table. For our people at home, I’ll describe this.
So when you want to add, let’s say you want to add a new value and it has a default to a particular table in your schema. There’s a possibility of that if it’s a particularly large table, for it to be locked while that happens. At least this used to be the way, I think modern databases are much better at this now.
Peter: I believe for some default values, it’s now the case that you don’t need to do a table lock but that might be pushed until 9.6 or Postgres, at least but it’s a common and challenging operation.
Paul: And so what IMVU started doing was instead of adding to a schema, they said that for every table, that schema is immutable and instead they created a new table and then they migrated the data from that table so every time you want to read from the user, you’re reading from user table 38 at the moment and then you start to read from user table 39. If the user isn’t in user table 39,
Peter: You fall back to 38.
Paul: fall back to 38, find it, migrate it, write into 39 and then do the thing. Then you have a background process.
Peter: If you have any listeners who are listening from the world of the Internet and would like to try implementing this technique for themselves, Postgres has something called a schema search path so you can actually, basically make each schema its own version and then you have a search path so if the table wasn’t in the newest version, it would use the last version’s search path.
Paul: Oh, wow.
Peter: You could set a version on your code then basically, as you make new versions of the tables and the new schema, you can kind of move them forward. This actually sounds like it would probably work. Now it’s really easy to say that from sitting around a table here in beautiful San Francisco but you have any listeners out there who do give this a try, have them send me a note.
Paul: I was not expecting this to be a session about continuous delivery databases and the intricacies of modern versions of Postgres.
Peter: I have to say, it’s one of the hardest things is basically, how do you deal with your data? I mean, data is sort of fundamentally heavy, I mean I think about it as having mass, which is to say, if you’ve got a few terabytes of data on a disk somewhere, just getting it out of that server and into another server is gonna take hours.
Edith: Right, just querying the data, loading it all into memory, that’s gonna take time and so this is really a challenge for continuous delivery and people have tried all kinds of schema versioning and replication strategies and sharding strategies but, you know, the old question, okay you deploy a new version of your code and it needs the new column but the new column’s not there yet. Okay so you move the data to the new column and then you deploy your code but then the old code doesn’t have it.
Paul: Right, right.
And these are hard problems and no one’s really done a great job of answering them. So far, sort of the state of the art is, you know, you continuously deploy the new version of your code that supports both schemas then you continuously migrate your database and then you deploy yet another version.
Now without continuous deploy this is even harder—
Paul: This has to be tough on a Gameboy cartridge.
Peter: Oh yeah, let me tell you, I think we had 4k of RAM or something like that so at least it was fast. But yeah I think data and continuous delivery are both challenging interactions for sure.
Paul: We end up at this problem in the testing world when large customers, in particular, large enterprises use the old way of testing, which is you actually use a dump of your production database as part of your test suite and one of the major challenges we get when we’re looking at older companies who are coming to the new world, is, you know, your server’s over here, my server’s over there and I want to get a 5 Gig database dump onto your server to run my tests, how do I do that?
Peter: Right and it’s trying to, like, move a bathtub through a straw. So how do you do that? You just wait, right?
Paul: We wait until they stop doing that and then they can become our customers.
Edith: So do you turn customers away then?
Paul: We don’t turn them away but we don’t really support that use case, they’re going to have a miserable experience.
Peter: I mean there’s data migration again, right, just coming back to the physics problems. There are things that are getting better so,
Paul: The speed of light is improving?
Peter: the speed of light has gone up 13 percent in the last year, did you know that?
Paul: I don’t know if you’re joking, I presume you’re not.
Peter: It absolutely has not, yeah. They call it a constant for a reason. I suppose it could.
Paul: I thought there might be something about, like,
Peter: Dark matter, no but it was very convincing the way I said it, wasn’t it?
Edith: Like this is Hindenburg’s Principle?
Peter: It’s uncertain. So anyway, the logical replication scene is getting better. Amazon’s just launched a new service, it’s three letters long.
Paul: Oh really?
Peter: Yeah, uh, got it, hang on. DMS, Data Migration Service. And so through black magic and probably by not working very well, it will move all your data from one database to another in sort of a streaming online way which is really cool if it works.
Peter: But, you know, moving from Oracle to MySQL
Paul: So you can talk to this service as if it’s MySQL and it will pull it from—
Peter: The way it works is you basically set up a migration job and if you think about it, it basically queries one database and writes to another and it’s a little bit of a hack job in the sense that you have to run, like, a desktop client if you’re moving from a local machine. It’s all very complicated and kind of baroque but,
Paul: So it’s an Amazon service? It isn’t you’ve used Amazon before, I gather.
Peter: But yeah, no, it’s what everybody’s always wanted which is to move their data from one place to another without having to think about it and, of course, it’s never gonna work.
Peter: But, you know, it’ll help, it can help, things could be less terrible. I think ultimately that’s what got me into databases was the understanding that no matter how impossible it would be to make things good, at least we could make them a little bit less bad.
Paul: The soul-crushing misery of writing software could at least be slightly less crushy.
Edith: I feel like writing software is one of the greatest joys in life because you get to create something from nothing.
Peter: So you don’t work in databases.
Paul: I’m not sure you work in software.
Edith: So what are the biggest changes you’ve seen in continuous delivery?
Peter: Oh, ubiquity, certainly. It’s rare to meet someone these days that, maybe I’m just lucky enough, I live in San Francisco.
Edith: You hang out at Heavybit.
Peter: And I hang out at Heavybit with a bunch of other crazy people but I guess I would say growth then, right, I mean we’re seeing large enterprises practicing continuous delivery—we’re seeing small shops practicing continuous delivery, it’s not, sort of, a bleeding-edge concept that only, sort of, a small cadre of, like, sort of, avant-garde developers are doing. It’s really, I think it’s kind of the mainstream.
Listen to full podcast here to hear Peter talk about continuous delivery in the context of organizational structure, databases and pricing.