If you ever wanted to learn about Rugged DevOps (some call it DevSecOps), sit down for a spell with Shannon Lietz, Ian Allison and Scott Kennedy from Intuit. We discussed a number of important topics including internal war games, culture hacking, gamification of Rugged DevOps and starting as a small team. There are 100 gold nuggets in this conversation for novices and experts alike.
Derek Weeks: I have some of the Intuit DevSecOps team here with me today. We’re going to talk to them a little bit about Rugged DevOps and how things work over at Intuit. Let’s start with some introductions.
Allison: I’m Ian Allison. I help run the Red Team at Intuit, which is, I guess you’d say, an interesting way of taking control of security at our company. We try to get ahead of the attackers by basically being the attackers. We’re essentially ethical hackers. We go after all of our own stuff to make sure we can find where the deficiencies lie in all of our software.
Lietz: I’m Shannon Lietz. I’ve been working at Intuit 3 1/2 years and helped to found the 24×7 DevSecOps capability at Intuit, leading the Red Team, our security operations capability, our cyber SOC, and what we also consider “blue teaming”: being able to hunt for defects.
The organization has really had to transform how we do software development because we’re a 30-year-old software company. We are now seeing the traditional way of putting together software really embracing DevOps. For us, it’s been exciting to really work in the industry with Rugged DevOps, trying to help build security into the DevOps movement.
Kennedy: I’m Scott Kennedy. I run the forensics and threat intelligence part of cyber work.
Weeks: Shannon (@devsecops), tell me a little bit about software supply chains and how that vision of software development has impacted the way you see things at Intuit.
Lietz: That’s really a great question. It was interesting when Josh Corman and I first talked; we had a lot in common. One of those things was the software supply chain. What I really love about the concept is being able to have processes driven a certain way so that you can reduce defects.
Having worked for Toyota in the past and understanding the supply chain mentality, you get a sense of how you could put something together better, incrementing on it, figuring out how to share that process, and then really figuring out what things are important. Having that notion of fewer, better suppliers was really a core concept.
I love the idea of transparency, building things a certain way, and really getting into continuous improvement. You need to look at things from an opportunities perspective—making sure you’re not just looking to make things perfect. You’re looking for those opportunities to improve over time.
Weeks: As we think about Rugged DevOps within your security team, how do you measure the success of what you’re doing? What kind of metrics are you looking at that matter to the business?
Lietz: We measure everything. For example, mean time to remediation (MTTR). Once somebody finds a defect, we analyze that defect from the time it got into the supply chain to when it actually gets resolved. We track everything from mean time to remediation to when the ticket was created, to being able to look at when the code actually got published, to when it actually got found, and then we work on those things over time. We really try to uplift.
Once somebody finds a defect, we analyze that defect from the time it got into the supply chain to when it actually gets resolved.
We leverage JIRA [Software] just like a software development team does. We register our defects and figure out how to get development teams to take responsibility for those ideas. It goes through their process of release and regression testing. As part of that, we look back to see where our opportunities are.
As an example, we started out where things may have taken weeks. We then reduced it down to days and ultimately got it down to hours. We’ve seen defect resolution where it’s now minutes. When it’s something we’ve discovered that was just a mistake by an engineer, we realize “mistakes do happen.” We found that our cycle times also help us to find fault stack vulnerabilities in real time because we get to do end-to-end testing more aggressively utilizing this method.
Weeks: How has consistency in your operations helped with Rugged DevOps and has it fragility within the organization?
Allison: One of the things we do is to utilize a golden image for all of the AMIs (Amazon Machine Image) we use, for all of our customers, and we require everybody to use these AMIs. We’ve also built some really interesting automation around scanning these AMIs. So one thing we realized quickly when we first started native U.S., when we try to do full vulnerability scans against another system, if it’s set up to autoscale, we all of a sudden have 50 systems. Right? We can’t … it’s really hard to do a full vulnerability scan right against the system, so we came up with a way to share back all of the AMIs with a special account. Then we bring those up and we scan them. Then we grade them.
Based upon the vulnerabilities that are found, you’ll get a letter grade, like A through F, based upon the system you have. While we always strive to have our base image be an A, people continue to run on older images. But they get graded, and those grades get pushed up, so everybody in their org structure gets to see what the grade is for their account. I think by being a little standardized, basically with these images, lets us know what’s in everything, and we have a grade for everyone. It helps everyone have a really good idea of where they stand when it comes to a security standpoint.
Based upon the vulnerabilities that are found, you’ll get a letter grade, like A through F … so everybody in their org structure gets to see what the grade is for their account.
Weeks: That’s not only a grading but a policy enforcement governance kind of role that grading plays. How rapid is the feedback loop in that grading system for the teams that you’re working with?
Lietz: It’s really quick, and we’ve discovered through some science that having component-based resources like AMIs provides us with an advantage when doing things like remediating vulnerabilities. Using AMI-based resources, we have seen that when there’s a defect in it, we can find and remediate all of the defective AMIs quickly. That improves everyone’s security across the company.
So if you’re just picking out really good components, keeping track of those components and adding security into them, then you’ll actually see a different effect across our pipeline. A single change can actually have a dramatic effect on reducing the problems within the pipeline.
Allison: It’s really interesting. This morning I got an email from somebody that said, “Why did our baseline AMI go from an A to a C today?”
We had just received notice of a new vulnerability. Our stuff caught it, we scanned it, and we pushed the grade out to our portal where all our customers go to look at the grades. Our customers were able to see that change quickly.
They could now say, “Wow, it changed from an A to a C in less than 12 hours.” I think the feedback is really important. The other important thing is that we have people going and looking. I wouldn’t be getting emails about why has this changed if people aren’t actually looking and wanting to make their grades better.
Weeks: You mentioned customers. Are these internal customers?
Lietz: Yeah, for our development teams, we as a security team really have changed how we think about things. It used to be that the security team would go out and govern. Basically, you got the fear of the security team coming in, descending upon you.
We’ve really changed how that happens within our organization. We grade our resource components, and we grade the way in which our applications come together. That changes how developers want to operate because they really want to figure out how to get better grades in security. And it creates a learning dynamic that incentivizes somebody to improve continuously.
That changes how developers want to operate because they really want to figure out how to get better grades in security. And it creates a learning dynamic that incentivizes somebody to improve continuously.
Weeks: Does it create a competitiveness or gamification of who has better grades?
Lietz: Absolutely, which is why we did it in the first place. To your point there, gamification is something where when you start to grade components like that, you can actually start to leverage a leaderboard concept. We do have leaderboards as part of this. We have APIs where you can actually pull down your grades and include them in your automation. With these, you can make governance decisions.
If you sort of have that “game afoot,” right, your leaders can then ask for specific grades within their pipeline. That up-levels the system, and you just see a continuous improvement lifecycle come to bear. Ultimately, you see fewer defects, and ultimately, you get to the notion of Six Sigma in our way of thinking. DevOps is really about continuous improvement and embracing automation. Embracing that concept allows us to get to fewer defects faster.
DevOps is really about continuous improvement and embracing automation. Embracing that concept allows us to get to fewer defects faster.
Weeks: As you embraced continuous practices and DevOps practices, were there points when you realized certain old ways of doing things weren’t going to enable you to move forward?
Kennedy: In looking at the progression of what we’ve been doing, one of the decisions that was made in Intuit and one of the things that I saw was really unique was the way they decided we were going to migrate into AWS. Our idea was to have the chaos team be the first people out, and that’s the security team. So the security team was the one that was going out and finding out how to use each of the products that AWS has and creating the concept of whitelisting. Each product was rated as to whether or not it met security’s requirement.
Therefore, no team can go ahead and pull down this new cool tool that AWS released yesterday and use it in production because it’s not been “whitelisted.” That can go into their scoring. Their scoring is not only used by the development teams but also is useful when reporting to the board. When the board asks, “How are we doing as a company across the entire organization?” we can say that product A got a lower score than product B, and then they turn to the VP in charge of it and say, “Well, why?”
When the board asks, “How are we doing as a company across the entire organization?” we can say that product A got a lower score than product B, and then they turn to the VP in charge of it and say, “Well, why?”
We decided to not rush into the cloud but to take a careful, considered approach and migrate in a very intelligent and well-thought-out way. At the same time, we gave the chaos team the ability to make the mistakes and grow and learn, so they can immediately turn around and share the mistakes with everyone else. They could say, “Hey, these are the things that didn’t work for us. We came across a lot of problems, especially when you look at things like accounts and account roles.”
How do you control when you have thousands of accounts and you need to have some sort of administrative control?
You can either have a gigantic effort to force your namespace and your Active Directory to be the source of control. Or you can use the vendor-specific tools like IAM and have each account have their own separate islands, but with the concept of cross-account roles, you can then do remote administration from a centralized account. You have it locked down. You have the capability to have a restricted group and be able to remotely go in and do the necessary actions.
That also gives you an audit trail. That also gives you multifactor built in because the AWS products get those things added to them.
Lietz: I think when it comes down to it, I think culture-hacking your environment can have a profound effect, especially when you’re going through a DevOps transformation.
Weeks: What is culture hacking?
Lietz: That’s a great question. We use it when really trying to figure out how we as a security team can change and transform. A lot of the things that take place in a company are really based on traditional processes: What has worked before, and why would we change something that is working, right? If you’re really going to go into an innovative frame; if you’re really going to get into that next-generation innovation; if you are trying to figure out what’s going to work in that … it’s never going to be the thing that is working. It’s going to be the thing that you’re going to learn as you go to that next step.
Culture hacking is really about looking at the people who are operating right now and trying to figure out how you’re going to help them go from A to B, making that change. What is that the experience going to be like?
What we have done, to Scott’s point, is we’ve forced our security team to have empathy for the DevOps teams. We go through the process of developing something in the cloud, utilizing it as a method of taking their paranoia and trying to balance the notion of getting something done within a specific time frame. We try to really wrangle what it takes to do those things securely and safely but, ultimately, still be able to deliver for the business.
I think that culture hacking really comes into play when you’re trying to figure out how to move somebody from the rock they’re on to the rock you need them to be on and trying to figure out what those mechanisms are.
Culture hacking really comes into play when you’re figuring out how to move somebody from the rock they’re on to the rock you need them to be on.
Weeks: Part of your security practice is looking at open source and third-party components and your own binaries. Can you shed some light on how Intuit is using Sonatype solutions to better manage those vulnerabilities?
A lot of our DevOps practice is working together with it. We’ve put together our Nexus repositories to do code signing and figuring out how to really secure our pipelines a certain way. We are taking advantage of the fact that we can pick up components, track them and then scan them [for known vulnerabilities].
That’s allowed us to reduce the defect count that goes to production. Actually scanning and looking for vulnerabilities within our components and our open source libraries allows us to make better decisions about what we’re including in our software.
Weeks: When you govern what open source, third-party or proprietary components are being used by developers, is there any kind of feedback from the teams saying, “Hey, you’re restricting my behavior, not improving my innovation”?
Lietz: What we’ve found is that the notion of security approvals, exceptions and gates really doesn’t work. Quite often, you just create a culture where developers are going to go out and do it, and then you’re going to find out about it. When it comes to really partnering and being boundary-less about how you think about security in your business, it’s all about transparency. It’s all about benefits. It’s creating things like a security markdown file within your repository manager. It’s about taking responsibility and accountability for the things that you’re doing from a security perspective in your development process. It’s ultimately having an attacks.md file, keeping track of what’s out there, keeping track of your open source, understanding what components you’re leveraging, and why you made the decisions that you made to bring those things into your project.
It’s about taking responsibility and accountability for the things that you’re doing from a security perspective in your development process.
At a top level, all of those things work. But really having tools that can help the decisions that were made by some of the other open source programmers that you’re getting contributions from is really necessary. All of the things that they might be deciding are also part of your decision tree, and ultimately, you’re rolling all of that and bundling it together. The attack surface is not just the decisions that your team is making, but the ones that you share across the code base that you’ve got.
Weeks: Your practices are very mature. You’ve clearly developed them over a long time, and some people watching this might think, “Well, Intuit’s a huge organization,” and it may be daunting to them if they haven’t started down the path of Rugged DevOps. Can you be a small team and have success in these kind of practices?
Lietz: We’re not exactly a huge organization, but we are relatively large in size now. When we got started, I believe I was one of maybe three people that started this, only a couple years ago. We have hired into our group pretty extensively to help grow it, and some of the things that we’ve done have really allowed us to operate differently, to bring in people and have them immediately be successful. Our practices allow someone “day one” to be able to work with the environment, to be able to develop code, to be able to contribute code that week.
We do things like weekly demos, where we actually do video demos. A person has to come in, program something, secure something, operate it and create a demo, all within their first week. So having the right bar for those folks is really important, but more importantly, our Red Team leader here (points to Ian), he came in and just is amazing, has created a Red Team pretty much out of thin air. So is having somebody from forensics, who’s just done an incredible job to help us, to make it so that we have a life cycle where we can snapshot something and be able to learn from it when it’s actually offline.
A person has to come in, program something, secure something, operate it and create a demo, all within their first week.
Those are the types of practices where you start to extend yourself past the normal baseline practices of processes today and really think past that about how you’re going to support innovation. You get into it very quickly. You get a learning culture. You get people who know that making mistakes—and figuring out how to learn from them—is okay. That’s a really important part of that actual culture that you’re putting in place.
Allison: Yeah, I was going to say, it’s all about iteration, right? We started small, and we just continually iterate on what we’re doing to try to get better and be better at what we all do.
When I first started this journey, I was a security guy—a pen tester. It was always the developer’s fault. Developers always made the mistakes. I always had to clean up after them. But after six months of developing Ruby APIs and Ruby and working my butt off in code, the empathy was there.
I really understand what the developers are going through and why they make the choices they do. But I think by allowing us to help them, by creating tooling that allows them to self-serve, to understand it without making them … helps them make themselves more secure without them having to become a security professional. I think that’s kind of our ultimate goal.
Shannon: Being friendly hackers, right? Basically going out and attacking them so their applications don’t get attacked by external attackers is really part of that frame.
Kennedy: The Red Team shift at the company has been profound because you see how people react. When the Red Team started, it was not as well shared, and a lot of people suddenly were very upset that they were attacked by the Red Team. But when it was pointed out, “Well, what would you rather have happen? Would you rather have somebody in China do this to you, that didn’t work with you, didn’t sit next to you and help you fix the product, or would you like to have a friend who, by the way, their job is to attack?”
When we went through several drills and actually practiced the muscle of defending the company against an attack, people were upset. “Oh, I had to do all this work.”
My response to them: “Well, you did the right work.”
“You did the right thing. You saw something bad. You did it. You did good. You practiced the muscle. Now when it happens again and it’s not the Red Team, I know that you’ll know what to do. You know that the process works, and we can actually defend the company faster and more securely.”
You know that the process works, and we can actually defend the company faster and more securely.
Weeks: Yeah. That’s an incredible story. Thank you for sharing it.
My final question: If you could pick a superpower in dev, security or ops that you would have in the organization, what would it be?
Allison: To me, they’re all alike; they’re the same, right? That’s what we do, DevSecOps, right? We try not to actually separate them out because I think once you start to separate them out, you start to lose perspective.
Allison: There’s a good thing about having them all be one thing, so I’d choose them all.
Kennedy: It’s been pretty consistent. DevSecOps is the answer. What was the question? (Laughter)
Lietz: I think the reason we went out and created DevSecOps was just simply to change how we thought about doing development and technology—and to really to get ahead of it, to realize that attackers weren’t setting up appointments or meetings to help you figure out how they were going to attack your software, and so then why were we? Why were we operating at a fragile level?
I think that the superpower that I would like to have is DevSecOps because I know that we are going through the process of creating a less-fragile capability in security that will allow us to get ahead of attackers, make it much harder for them to go after the software that gets built, and we’re seeing those improvements. That’s actually a great thing.
Weeks: It sounds really exciting, and it’s very cool, so thank you all very much. I really appreciate it.
All: Thank you.
If you loved this interview and are looking for more great stuff on Rugged DevOps, I invite you to download this awesome research paper from Amy DeMartine at Forrester, “The 7 Habits of Rugged DevOps.”
As Amy notes, “DevOps practices can only increase speed and quality up to a point without security and risk (S&R) pros’ expertise. Old application security practices hinder speedy releases, and security vulnerabilities represent defects that can leave a company open to cyberattacks. But DevOps practitioners can leap forward with both increased speed and quality by including S&R pros in DevOps feedback loops and including security practices in the automated life cycle. These new practices are called Rugged DevOps.”