Spinnaker Summit 2019 Preview: Software Engineer Rainie Li played an essential role in implementing Spinnaker as part of Pinterest’s CI/CD pipeline. The results moved Pinterest from two scheduled deployments per day to continuous deployments, greater than 15 during business hours.
This episode of DevOps Chats features a preview of Rainie’s talk, “How we introduced CI/CD for Pinterest’s largest monolith services (API and Web) to improve developer velocity, quality & reliability (Pinterest).” Topics including how Spinnaker was selected, important metrics, Pinterest’s future CI/CD platform Hermez, Canary analysis and lessons learned from the journey are shared.
Rainie’s talk is on Sunday, November 17th, 1:30 pm PT, at Spinnaker Summit 2019 in San Diego. Joining Rainie on the talk is Jasmine Qin, software engineer with Pinterest.
As usual, the streaming audio is immediately below, followed by the transcript of our conversation.
Mitch Ashley: Hi, everyone. This is Mitch Ashley with DevOps.com and you’re listening to another DevOps Chat podcast. Today, I’m joined by Rainie Li, who’s a Software Engineer at Pinterest.
Now, Rainie’s gonna be talking at the Spinnaker Summit 2019 coming up in San Diego, November 15th through the 19th, and her topic she’s presenting on is how Pinterest implemented CI/CD pipeline for large, monolithic, or monolith services, APIs and web, focused around helping them improve developer velocity, quality, or liability, et cetera. So, some real hands on experience that she’s planning on sharing. And she’ll also have a co-presenter, Jasmine Qin, and she’s gonna be joining her, too, at that presentation.
Well, Rainie, welcome to DevOps Chat.
Rainie Li: Thank you, Mitch.
Ashley: Awesome to have you on. Thanks for joining us. Would you just introduce yourself, tell folks a little bit about you, the kind of development work that you do at Pinterest and also maybe a little bit of what you’ve done in your past to become a software engineer?
Li: Yeah, sure. So, hi, everyone, I’m Rainie Li, I’m working at Pinterest infraorganization and our team is focused on doing continuous delivery platform for all the internal engineers. Before I joined Pinterest, I worked at both Microsoft and Amazon, also as a software engineer, and I found I have a very strong interest in infrasite, and it turns out, continuous delivery is a very important feature for software engineers. So, that’s why I ended up here.
Ashley: Very interesting. Well, you’ve worked at some very large online services. I know that Pinterest has, what, about 250 million users, and of course, Microsoft and AWS are very large organizations, so you’re kinda used to working on some pretty big infrastructure projects, it sounds like.
Li: Yeah, I am. [Laughter]
Ashley: Good, good! Well, great experience. So, tell us a little bit about your talk. I guess maybe start with, you know, you came to Pinterest, maybe about where they were in this process of implementing continuous delivery and where you kinda picked up in the process. Were you at the beginning or they already started?
Li: Yeah. I kinda joined the one that already started, but I still drive the whole production readiness for Spinnaker at Pinterest. I also drive the design review for Spinnaker as well. So, I kinda joined the one that already picked Spinnaker for Pinterest, but I’m the main person to continue to deliver this project.
Ashley: Oh, so you kind of picked the project already in progress and then take it forward from there as one of the leads on it?
Ashley: Very nice. So, when you said—so, you do a review of Spinnaker. Is that your internal usage and changes in implementation to it, or do you also contribute things back to the open source? Any of that kind of thing?
Li: Currently, we only reveal how we use it internally and we do some customization for Pinterest only. We haven’t contributed to the upstream yet, but I think long term plan, we would like to contribute to upstream.
Ashley: Yeah. Well, you know, just using it at the volume that you do, I’m sure, is a way of, a form of contributing back, too. It’s gotta certainly exercise Spinnaker and a lot of other tools about that, too.
So, tell us about—so, you picked up when they had already gone through the implementation and you’re kinda furthering it, or were they still in the process of implementing CI/CD continuous delivery?
Li: I picked up when they were still in the process of implementing a CI/CD platform.
Li: But they kind of already decided to use Spinnaker, but we still have to do Pinterest specific customization and how to make it production ready at Pinterest, that kind of work.
Ashley: Ah, excellent. So, sounds like you had quite a bit to do with Spinnaker even when you joined of getting it ready for production. Why don’t you talk about some of the things—I know you’re gonna talk about this in your Spinnaker Summit talk. What are some of the things that you need to do to Spinnaker to get it production ready for that kind of an environment that you’re in?
Li: So, I think there are several key pieces. The first one is, we have to make sure authentication and authorization are working as expected for Spinnaker, because that’s the most important thing—like, we have to make sure users can go over our OS process before they can—
Ashley: Your OS, mm-hmm.
Li: Yeah. The second key piece is monitoring, which includes metrics and alerts. So, basically, we exposed all the Spinnaker service metrics and created a dashboard for those alerts, which we can get paged when there’s some issue that happened if Spinnaker is down or something like that. These are the two most important things for production readiness, I think, for us.
Ashley: That’s extremely important when you’re automating something like a continuous delivery, continuous integration. It’s automated, but you have to know is it working right?
Ashley: Are things happening and are there problems or are you meeting the kind of metrics that you were expecting?
Ashley: I know something you mentioned in the description of your talk is what kind of metrics are important to measure. Can you say a little bit about what are some of the metrics that you’ve learned, both while you’re implementing it and now that you’ve had it in production that are really important to watch?
Li: Yeah, definitely, those metrics can—like, latency and ____ P90 and also, like, those SLA related metrics from all Spinnaker components are very important. I think especially the gate service, which is Spinnaker API Gateway, it’s like the entry point for the rest of Spinnaker components, the metrics for Gateway is extremely important.
Ashley: That’s kinda the integration component of Spinnaker, right?
Ashley: Everything talks to it. It’s sort of that—I don’t know if it’s a central hub, but it’s certainly where all the APIs tie in together.
Ashley: I’m sure if you’re having a problem with that, then you’re gonna know [Laughter] there are probably lots of failures happening in the system.
Ashley: Do you look at, do you also measure how many builds you’re doing per hour, per day, or how many deliveries into production you’re doing? Is that metrics that you capture through Spinnaker or you do that elsewhere?
Li: That’s not the metrics that we are capturing in Spinnaker, but we do measure from the pipeline execution history.
Li: So, currently, we are using Spinnaker to do the continuous delivery for two major services in Pinterest. We finished around 15 deploys per day for these two major systems and each deploy contains roughly 10 commits, and we think this pipeline, it increased our productivity a lot, because previously, we have to do manual deploy.
Ashley: Do you also have, then—I know this is outside of Spinnaker, probably, but do you have most or all of your testing automated as well, or are there still some manual steps for testing before it goes into a final deployment?
Li: You mean testing for the service itself, or testing for Spinnaker service?
Ashley: Testing more of the software application part of it.
Li: I see. Yeah, so, currently, we are using a separate stage to run integration test jobs in Jenkins. That’s how we do the testing step. It’s kind of automated already, like, we don’t have to interact with humans to trigger the build or humans to run the test. We don’t need that. It’s just a stage in the Spinnaker pipeline. We run the interpretation test job in Jenkins, yeah.
Ashley: Mm-hmm, excellent. Okay, good. We have kind of a feel for your workflow, your tool pipelines. It sounds like it’s very automated.
Ashley: So, going back to the topic of implementing something at that kind of a scale, both, you’re a very large organization and you’re doing a number of deployments per day. I know there’s some things in Spinnaker—well, you have a platform called Hermes. And is that your own central workflow engine, or is that part of Spinnaker?
Li: Oh, so, Hermes is our future CI/CD platform. So, we are in the, like, developing stage. Spinnaker will integrate with Hermes and now we will use Spinnaker as a back end workflow engine.
Ashley: Hmm, okay.
Ashley: So, that’s really your—that’s Pinterest’s own central workflow engine and you’re integrating Spinnaker into it as part of that feature?
Li: Yeah, yeah.
Ashley: Pipeline—tool pipeline, if you will. Okay.
Li: Yeah, it’s like a CI/CD platform for Pinterest, and we are going to open source Hermes as well in the future.
Ashley: Oh! Well, that’s good news. I’ll look forward to that.
Li: Yeah. [Laughter]
Ashley: Do you know kinda time frame when that’s gonna be happening, or is that a little farther in the future, don’t know yet?
Li: I’m not sure the exact timeline, but somewhere early next year or at the end of this year.
Ashley: Okay. Well, that’s not too far away. That’s great. We’ll look forward and thank you for contributing that as open source. So, talk some more about what are some of the lessons that you learned about going from where you joined Pinterest and where things were with Spinnaker’s implementation, maybe some things you had to do, some of the things that you learned? Here’s a good way to do it, here’s a mistake I made and I fixed it by taking a different approach—what are some of those lessons learned?
Li: Hmm, I think the lessons for me is, because we deploy Spinnaker services to Kubernetes platform and I never used a Kubernetes platform before I joined Pinterest, so that’s the biggest lesson I learned here, like, how to set up all the Spinnaker components in Kubernetes platform, yeah.
Ashley: Uh huh, good. Are there—so, learning how to do that was important. Were there any specific lessons that you learned about doing that, or is it just really just kinda learning the how of doing it?
Li: A specific lesson I think is some special situations where the Kubernetes platform decides to rotate cluster and do some platform testing. On the Spinnaker service side, we have to handle this kind of scenario well instead of bringing down the sides or have some happy customer user experience. That’s a lesson I learned from there.
Ashley: Okay. I think you also mentioned in your show notes or your description in your talk that you did some canary analysis as part of that as well?
Ashley: I assume that was a new thing for you, or have you done that before?
Li: Yeah, that’s also a new thing, but we have a separate team which is the configuration team which, they are mainly implementing this feature. So, for Pinterest, we support OpenTSDB as a metric, but I think Spinnaker canary analysis can only support Prometheus or Datadog, those metrics. So, I think the ____ team integrated this OpenTSDB into Spinnaker canary analysis component so that we can provide Pinterest ____ on Spinnaker UI.
Ashley: Ah, interesting.
Ashley: Okay, great. What are some other things that you’re planning on talking about at your talk during Spinnaker Summit?
Li: I think how we use Spinnaker and how we customize Spinnaker at Pinterest and the lessons we have learned when we adopted Spinnaker here. I think these are the main topics I’m going to talk about, yeah.
Ashley: Mm-hmm. Great. Well, I know you weren’t at Pinterest when they decided to use Spinnaker for this project or for part of the toolset. But based on what you’ve seen and what you’ve learned, do you think Spinnaker accomplished the goals that they had for why they chose it?
Li: Definitely. Yeah, we like Spinnaker a lot. [Laughter]
Ashley: What were some of those goals, do you recall? Were there certain, we wanted to get to a certain number of deployments per day, or were there other things that they were trying to achieve that were looking to see how you’re doing?
Li: I see. So, we don’t have a specific number for the goals, but Spinnaker definitely helped us to use pipeline for deployment. Without Spinnaker, we have to go to each stage, do manual collect deployment one by one, which is not very efficient. And currently, we can use Spinnaker to manage deployments in a consistent and a repeatable way. We really like pipeline, which can provide a sequence of deployment stages. We don’t need too much manual interaction, and—yeah, that’s, I think, the goal we adopted Spinnaker is, we need the pipeline feature and I think it works well for us. Also, the canary analysis report is very useful for us as well.
Ashley: Yeah, I know a lot of people are very excited about using that.
Ashley: And you mentioned that you integrated OpenTSDB as part of your observ—I can’t say it [Laughter]—observability stack, there we go, got that out. Now, I think you also have a deployment system there that Pinterest uses to deploy into AWS, correct?
Li: Yeah. We have a deployment system which is also open source called Teletron, which we used for four years to deploy to AWS VM. But this deployment system does not provide a good pipeline, so that’s why we integrated with Spinnaker to have the nice pipeline feature, yeah.
Ashley: Excellent. Well, you’ve sure done a lot of work. What’s kinda next? What’s the next set of projects that you’re thinking about or you’re currently working on now that you’re at this point with deploying Spinnaker into your CI/CD pipeline?
Li: I think the next project is, our team will implement Hermes at the Pinterest main CI/CD platform. Spinnaker was working at the back end workflow engine, and we were also working on migration, like most of the services from AWS and VM to Kubernetes. Yeah, that’s two major projects that we are going to be working on.
Ashley: Ah. Well, certainly, the move to Kubernetes, I’m sure that’s a pretty substantial project, a pretty big one.
Li: Yeah, yeah. [Laughter]
Ashley: Well, good. Well, thank you so much for joining us on the podcast today. This is very interesting and I think you’ve got a really compelling and interesting talk. I’m looking forward to hearing more about it and have others to get a chance to attend.
Li: Yeah, thanks for inviting me, Mitch. I’m glad to introduce Pinterest using Spinnaker at the Summit and I’m very happy to talk about more there.
Ashley: Absolutely. Well, thanks for contributing with your talk. So, you’ve listened to another DevOps Chat podcast. I’d like to thank my guest today, Rainie Li, who is a Software Engineer at Pinterest. Again, her talk during Spinnaker Summit is on Sunday, November 16th at 1:30 p.m. This is according to the agenda currently on the website, you can look there for updates for the Spinnaker Summit site, and her talk, again, is on how we introduced CI/CD for Pinterest’s large monolith services, API and web, to improve developer velocity, quality, and reliability. And as you’ve just heard, there’s a lot of information that Rainie’s gonna be sharing with you.
So, thanks for joining us, Rainie, and thank you, of course, listeners, for joining us here today. This is Mitch Ashley with DevOps.com, and you’ve listened to another DevOps Chat podcast. Be careful out there.