DevOps Chat: The Kubernetes and Multi-Cloud App Journey with Rafay

After Akamai acquired his company, Haseeb Budhani decided to take on his next challenge and start Rafay.

Rafay focuses on the complexity and repetitive aspects of deploying and managing Kubernetes applications. Rafay addresses complexity by providing application abstraction, cluster blueprinting and enterprise-ready integration, making Rafay a great candidate for multi-region, multi-cloud, hybrid and edge/MEC adoption.

Join us on this episode of DevOps Chats where Haseeb shares some of the experiences and challenges that led him to found Rafay, and how to accelerate your path to Kubernetes and multi-cloud applications.

As usual, the streaming audio is immediately below, followed by the transcript of our conversation.

Transcript

Mitch Ashley: Hi, everyone. This is Mitch Ashley with DevOps.com, and you’re listening to another DevOps Chat podcast. Today, I’m joined by Haseeb Budhani, CEO and co-founder of Rafay. Our topic today is lifecycle management for containerized apps. Haseeb, welcome to DevOps Chat.

Haseeb Budhani: Mitch, thanks for having me. Great to talk to you.

Ashley: Yeah, nice to talk with you. Always love talking with a fellow entrepreneur. Could you start by just introducing yourself, a little bit about your history, what you do and a little bit about Rafay?

Budhani: Absolutely. So, Rafay the company is slightly under two years old at this point. We have a product available on the market that a number of customers are using which, essentially, helps them simplify the ongoing operations and life cycle management for their containerized applications, which they may be running in public cloud, hybrid environments, or any combination thereof.

Fundamentally, there is a brunt of work that every company does as it relates to ongoing management of the containerized apps. You know, essentially, they build a platform internally on top of Kubernetes, and our pitch is, “Hey, why are you spending time rebuilding the—kind of reinventing the same build over and over again in every company? Let us help you with our solution which focuses on the repetitive effort that every company goes through, so that you, Mr. DevOps Engineer, can focus on the true value that your company is looking for out there. So, you focus on company value. We’ll help you solve for the repetitive work that you have to do, anyway.”

Ashley: Very good. Now, you started this company after selling your prior company, Soha Systems, to Akamai, I think you took about a year before you jumped out and started Rafay. So, why did you pick this particular problem? What was it about it? Was it something you experienced in Akamai, in your prior company, or just as you were kinda taking that time to really think about what you wanted to do next? Why did you pick this space?

Budhani: Yeah. You know, Akamai, it was, surprisingly, a lot of my friends who’ve kind of seen exits, you know, kind of, they talked about acquiring a company, it was tough, and I didn’t like the experience or whatever. I actually had a lot of fun at Akamai. It wasn’t as stressful as it used to be. I mean, you know how stressful startups are, so it was a pain-free experience, really. You know, an incredible sales organization. They really understood our product, took us to this whole new level.

But, you know, sometimes an idea gets stuck in your head and it’s just impossible just to walk away from it. So, in our prior startup, so pre-acquisition by Akamai, we were practitioners of the problem that we are trying to address. So, essentially, we were the customer.

Some of our colleagues at Soha were effectively building a platform like the one I just described, wherein we were kind of writing at this layer on top of container orchestration so that we could be running our application across many locations. So, at the time of exit, we ran—I don’t know, somewhere between 20 and 30 POPs globally across multiple cloud providers just so we could have our cloud based security platform at Soha running in a number of locations globally.

And it was very, very painful. And I think at that point, we didn’t realize that when you’re building a platform, when you’re so focused on the core value which is the security, you know, this DevOps thing is important, but you don’t spend as much time strategically think about it. And that, after the acquisition by Akamai, Hemanth, my Co-founder at Soha and also here at Rafay, we spent a lot of time thinking about this. Like, why did we have the experience we did? Because it was not a good experience. And then we thought about, “Well, who else has this problem?”

It turns out everybody has this problem where anybody building a SaaS platform or running applications in the cloud, particularly when it comes to containerized apps, because the community is so new, there is a whole lot of supporting cast, if you will—mythic. And it’s gonna come. It’s just, right now, it’s missing because it’s so new. And that got us thinking, “Well, somebody’s gotta solve this problem.” You know, and when you kinda look around and say, “Somebody’s gotta solve this problem” and nobody raises their hand, maybe you should do it.

So, that got us thinking, “Well, we should—we need to go solve this problem.” So, we left. And I’m sure both of our wives were very unhappy when we said we were gonna leave Akamai. [Laughter]

Ashley: [Laughter]

Budhani: But—no, it was a worthy cause, and we’re very happy we left because we solved a really important problem for the community.

Ashley: Well, congratulations for starting another startup. I think it’s one of those kind of in your blood type of things. So, talk a little bit about what’s unique about Rafay. I’m assuming you took some of the same problems that you experienced of what you’ve built at Soha and probably along the way said, “Hey, you know what? This might be a good idea for a product for containers and containerized apps.”

And I think I would just add to the conversation by saying, as you start to build larger and larger containerized apps, you realize what some of the management challenges really are around it, and so that’s, you know, borne out of that is a new opportunity. I’m assuming you had a similar kind of experience.

Budhani: Exactly right. You know, you read some blogs about, you know, like, Kubernetes. And it makes things look very easy. You can get something up on your laptop and you’re like, “Oh, look! I have a Kubernetes cluster running on my laptop! Isn’t this easy?”

Ashley: [Laughter]

Budhani: No. When you run it in production with multiple applications, it’s hard, and everybody knows it, this is not new information. But the challenge is that, because there is as yet no manual for this, we only learn by doing. So, we have to kind of jump into this and we start building stuff, and then we go, “Oh, this is hard,” and then by that time, we already have pretty large teams in most companies who are working on this. And that’s the problem and, of course, that is the opportunity, also.

Here’s a better way of thinking about this. So, because of our experience at Soha, some things we recognized. Not everybody on our team we could train to be experts at the platform.

Ashley: Mm-hmm.

Budhani: Some people are experts and most people are not gonna be. And this is very typical in most companies, you have a DevOps team of five, seven, 10, 20 people who build a platform and then another hundreds or thousands of engineers, depending on the size of the company maybe consuming the platform. Because basically, they need to understand this—they need an abstraction layer. That was a key insight for us, having spent time talking to companies, you know, once we were out kind of looking at, you know, what do we do next?

So, everybody seems to need some notion of an abstraction layer so that I, as a developer, I check in code and just magically, my application shows up somewhere. I don’t want to think about this underlying complexity. And the second thing that we saw, which VMware, surprisingly, validated two weeks ago when they did a bunch of announcements at VM World was that nobody’s running a single cluster.

Ashley: Mm-hmm.

Budhani: They may start with a single cluster, and because it’s easy to keep adding nodes to their cluster. But the practical approach around clusters is to run many small clusters. And VMware spent a lot of time talking about this idea of many small clusters.

So, that implies solving for multi-cluster operations, multi-cluster federation is a critical thing. And that continues to be a gap in the community right now. There are a number of approaches that have been taken to solve for multi-cluster federation in the community, but—and, you know, a lot of companies have talked about this, there are tools that exist out there to solve this problem, but fundamentally, they make certain assumptions that are not borne out in practice. And that was a unique thing that we saw as a gap and we addressed it very elegantly in our platform.

There’s a number of things we do, for example—I’ll give you another example. And anybody listening to this podcast, as you hear this examples, you may kinda look back to your experiences and go, “Yeah, I’ve faced this.” So, you bring up a cluster. Let’s say you are some e-commerce company. So, you need some clusters that need to be PCI ready, or PCI compliant. Some clusters, for marketing, because all they do is some experiments—I don’t need to be stringent for them. And then you have people who are just writing code, developers.

So, you need a blueprint for a cluster, because each cluster may need different things. How do you do that? It’s not easy. It’s not easy. You know, there’s no kind of help chart for help charts. What do I do then? These are some of the problems, right?

So, if you really spend time talking to the DevOps community, these are some of things that they struggle with and today, they all take a bunch of open source tools and try to build something for themselves. Some do it very elegantly. Some just never get to that level of perfection because they are so busy. And this is my pitch to every customer. My entire team, all of us—we do one thing. We do one thing. We help you run your containerized applications.

Ashley: Mm-hmm.

Budhani: You have 50 things to do. Let me help you with this, right? Let us be an extension of your team. Let us, you know, really take this one stress away from you—of course, pay money for that, [Laughter] but focus on the things that are more important to the company.

Because getting the right script in place or whatever in place just so you can run your Kubernetes and run it better—this is not how you make money. If your competitors all ended up buying from a vendor to solve this problem versus build it themselves, they’re gonna probably sell a single more unit of widget that you’re selling, right? Focus on the value. Don’t focus on the very complex but effectively undifferentiated work.

Ashley: Well, what you’re saying makes a lot of sense of why rebuild it all yourself? Why learn all the hard lessons that others might have learned? And I take it that you do things through your abstraction layer with your blueprinting of essentially kinda creating what we might think of in software as here’s patterns that we might use, here’s blueprints we might use for different clusters.

Budhani: Yeah.

Ashley: And I know a common mistake I’ve seen happen is, it’s really easy to throw everything into one cluster and then start to say, “Well, that doesn’t work. How do I start dividing it up into, besides just geographic location, what are the other techniques for doing that?” So, those are the kinda things that you’re bringing to the table with Rafay, is that correct?

Budhani: Yeah, absolutely. Yeah. I mean, I’m sure many of us continue to have the Design Patterns book from college on our shelves, right?

Ashley: Sure, yeah.

Budhani:[Laughter] That’s the idea, right? Some things are best done in a specific way, and if there’s a way to do that, if it’s a best practice way to do it, you do it. Not every company may have their own quote-unquote blueprint, and that’s okay. But let’s help you manage those better.

In effect—you know, and this is fundamentally, we talked about, practically, what Rafay is selling today. But if we fast forward three, five years, we sell something today, but we still have to have a raison d’être, right? Why are we doing this? I think, fundamentally, there is a system of record needed in our industry that maybe doesn’t exist as well as it could. Where is that one place in a company where I can go and say, “Hey, where is ____ right now? How is it doing? Three months ago, how was it? Who changed what last time? Where is that information today?”

It’s in people’s heads, sure. It’s in a Git repository. We all talk about GitOps, right, which means somebody wrote a script and checked into a git repo. Yeah, but maybe the guy who wrote that script left six months ago—now, who understands that script? That is a very common occurrence—people write scripts and then they move onto the next job. Where is that system of record? Where is that system of governance for an application?

Particularly with containers, there is an opportunity to very elegantly address that problem, right? Have a system that allows you to do that. One system where somebody who is non-technical can log in and say, “Hey, just from a compliance perspective, is this app running on all the right clusters right now, or not? One shot, can I know that?” That is where I think we will all go as a community.

And as I kinda think through what are those bread crumbs to get us there, some of the things that we are working on as a product company today, my hope is that we get there. We get the opportunity at Rafay to build that system of record that people will really treat as a strategic system for their own companies. Anybody in the company should be able to come and look at, “Hey, where is my—what is the status of my app right now? How has it changed in the last month?” That, to me, is such an important and worthy goal to be focused on. Let’s see how—you know, what it takes for us as a group to get there. But that, man, that—I mean, somebody needs to go solve that problem.

Ashley: Now, I haven’t used your product, but one of the things that I found really intriguing about your go to market and how you talk about it on your website is, you actually present use cases, more than just generic things, but something specific to, like, factory operations for IoT and customer experience app, modernizing in store retail experiences, 5G, edge deployments. Are those part of the kinda blueprints of what you talk about, or is that more, you know, on a broader scale, “Here’s how we might advise you about how to implement a containerized cluster environment?”

Budhani: So, these are the business use cases that are—the last one we talked about was this retail use case. It’s a fascinating problem. Many retailers are out there talking to larger vendors and in some cases to their ____ providers who happen to be large customers. And they’ve asked the question, “Hey, I want to deliver a better experience inside my store. Is the point of sale system in the retail store up or not?”

That’s a good question. How do you find out? We can’t run these tests from out of the store, because that’s a private network, basically. So, I wanna run an app inside my store. Oh, yeah? Where do I run it? Where is this infrastructure? This is not a—there’s no data center, here. They have probably one rack in the back somewhere, maybe with one or two machines. So, then maybe I should run containers here—but then, who’s gonna manage my containers, because I have 1,000 stores, it’s not one. And if I have to send IT people to every store to deploy an application and upgrade it regularly—well, I need an army now, right?

It’s just a small question. I need to run a small, tiny, 200 ____ app. It becomes this big thing. What an interesting problem. But this is a real problem. Think about any retail location where they have a lot of refrigeration capability. They’re selling ice cream or they’re selling whatever drinks and whatnot. If any one of these refrigeration units is not working, you lose a lot of money.

Ashley: A very big deal.

Budhani: And, of course, it’s a bad—right? So, they run these sensors, okay? Who’s collecting this data? Where is the sensor running? Is it a tiny little sensor? But then the sensor data has to be collected maybe locally. Okay, what do I do now? Right?

All of these small things, these are real business problems that kind of snowball into this bigger problem around application management, right? So, if I—if you consider, if we called a retail store or some head office for some retailer and said, “Hey, do you wanna buy a container management platform or life cycle management containerized app?” Like, “What is a container?” Right? [Laughter]

Ashley: Yeah.

Budhani: Their people know, but they may not know, right? But they understand this problem, right? It’s like, “I need to run apps here, because this is how I’m gonna make more money. I want to provide a better online and offline experience for my customers.” That opportunity, right?

So, that’s why, as we hear use cases, as we partner with different companies, and we see very specific and repetitive use cases come up again and again, our hope is that we continue to list them on the website. So it’s more meaningful for even the business or more kind of—you know, non-technical people in the company to come to a website like this and understand why, you know, their engineers are talking to us, for example.

Ashley: You described yourself as a SaaS based company, it’s a SaaS service. Do you also provide any operations support, kinda outsource of operations, too, or is it strictly a service that others then operate on top of your SaaS?

Budhani: So, it’s a service, and we do provide kinda more of a solution architecture model ready to essentially assist in our customers thinking through how they’re gonna consume a platform like this, right? Because a platform like this is not—it’s not a box that you plug into your network, if you will. This is a strategic conversation where we’re thinking about how to embed it in some way into your processes or into your, you know, own organization.

So, there’s some level of consultative conversations that are happening up front, definitely, and on an ongoing basis, also. But from an operations perspective, we don’t see that when Rafay is fundamentally a software company, you know, a lot of pre-sale support on the solution side. We are continuing to engage with a number of companies who excel at that, who look at Rafay as an enabler not just for themselves but for their customers to whom they’re delivering a broader service. So, think systems integrators, DevOps consulting organizations who, whether Rafay exists or not, they are solving this problem and they have other customers, right?

Ashley: Mm-hmm.

Budhani: So, those companies look at us as an accelerator for their own services and for their own offerings. So, that’s been how we’ve been approaching the market, so we can obtain a high margin business and be, effectively, a software company. That’s what we know how to do well. But there’s so many great organizations out there who have an incredible skill set as it relates to operations and integration. That’s where we step back and partner with somebody, and at that point, be our tool in the tool chain and they go deliver a great experience to their customers.

Ashley: Very good. That helps a lot. So, talk about who your most ideal customer would be. Is it a large enterprise, medium to large? What kind of an organization would say, “This is exactly what I need—come help me”?

Budhani: In theory, this is a horizontal problem. Anybody who’s doing, you know, running containerized applications has this issue. But fundamentally, what we, at different stages in the company, look for different kinds of companies in terms of just, you know, low hanging fruit, et cetera.

What we find works well today is, you know, relatively mature organizations who are doing somewhere between a couple of hundred million to a billion dollars of revenue, they are modernizing their applications, moving to the cloud, becoming more and more cloud native. And they’re the ones who are very interested in going fast, because there’s some—usually, there’s some business reason why we’re doing this. They’re not just modernizing for the sake of modernizing, there’s competitive pressure, there’s cost pressure. They may need to be in different parts of the world, et cetera.

There’s reasons why they need to be up and about quickly, and they will look for any shortcut to get them there. A shortcut in this case would be, you know, “Hey, I could build this myself. Should I?” Right? And what I just said, right? This is a question we, in some way, shape, or form always want to highlight to our customers, which is—hey, you have a really strong engineering batch. You can do anything. Should you do everything? Do you still build data centers? And that, of course—that’s a, you know, asking the question, “Are you still using, building data centers?” that’s a straw man, of course. Because, hey, I mean, maybe other than 100 companies in the world, nobody’s building data centers any more.

Ashley: Right, right.

Budhani: It makes a point. In our industry, we continue to go up the stack in terms of abstractions, right? I mean, all the way from assembly to op and now nobody’s building subnets any more. Private subnets, published subnets—you just make an API colony and you get one in Amazon, right? Why am I building this stuff?

Similarly, we are providing them yet another level of abstraction which allows them to move faster. Because fundamentally, the name of the game is, “Hey, if I don’t move fast, if I don’t produce product fast, I’m gonna lose.”

Ashley: Now, I know you worked with AWS as your et cetera VMware. Is there a particular environment that you’re best suited for? Like, if you’re a VMware customer, you’re gonna—we’re gonna connect right into your environment, or is there a technology stack that’s more easy for you to integrate than others?

Budhani: So, this is a very agnostic platform. I mean, VMware environments can be anywhere. And I’ll highlight a specific thing that we’ve thought through and we implemented as part of the platform.

So, let’s assume for a minute that it’s a mature organization, they do have some hybrid environments, and they maybe have some Tectonic set up and they’re running VMware there, and they also have Amazon, and that’s okay, right? So, in one environment, maybe they’re using PKS from VMware, PKS Essentials from VMware. But in Amazon, they’re using EKS, because that’s what Amazon sells for Kubernetes.

Okay, now what do I do? Am I gonna build two sets of platforms to support two different environments? Not a lot of people do that. Or, with a platform like ours, what we say is, “Hey, just—look, if you have an opinion on Kubernetes, go with it, and essentially, we’ll add value on top of your existing Kubernetes throughout the operator.” It’s a Kubernetes concept called an operator.

Ashley: Mm-hmm.

Budhani: Or, if you don’t care and if you say to us, “Hey, you know what? Just bring me the, I don’t know, whatever the latest upstream Kubernetes is, 115 or whatever—just bring that.” No problem, we’ll work with that, also. But, for our SaaS platform, to help you manage your environments behind a firewall in a data center or behind a security group in Amazon, you don’t ever happen to open a single portal in the firewall for us to get in from outside. The entire system is designed to be secure such that there are sessions being launched from inside out, so your clusters run an agent, basically. They reach out and broker all the connectivity without making a security issue kind of pop up on the security team’s radar.

So, this is another thing, by the way—anybody listening to this podcast thinking about some sort of a multi-cluster or multi-cloud solution? Please let’s make sure that you’re not signing up for a vendor who’s asking us to make maybe not the best decisions when it comes to security and asking us to open up ports, setting up IPsec links—you know, all of these are bad ideas that lead eventually to other problems that you haven’t thought about today.

So, these are some of the things where understanding our enterprise security, understanding our enterprise problems are important up front, so we thought through these things and we designed a platform to be able to work not just in any environment, but to work in these environments in a very secure fashion.

Ashley: Well, certainly, I think, with all your experience, you can help de-risk things and provide kind of a framework or a path for enterprises.

Well, unfortunately, we’ve run out of time. See, I feel like we could talk about this for another hour and a half and share a lot of good stories about it, too. I’d like to thank you—thanks for being on the webcast, the podcast, here.

Budhani: Yeah, thanks for having me. It was a really fun conversation. I look forward to talking again soon.

Ashley: Yeah, I do, too. Keep us informed about any new news. So, I’d like to thank our guest today, Haseeb Budhani, CEO and co-founder of Rafay. I’d also like to thank, of course, you, our listeners, for joining us today. This is Mitch Ashley with DevOps, and you’ve listened to another DevOps Chat podcast. Be careful out there.

— Mitchell Ashley