Can DevOps engineers unlock cloud cost savings for their organization that are incremental to what cost optimization tools offer (and improve the developer experience in the process)? Absolutely. Allow me to illustrate how.
Reserve instance buying optimization and auto-stopping are proven tactics to reduce waste, but are those the only forms of waste that exist in your cloud budget? Of course not.
Think about all the times environments have been brought up and then stayed up longer than required. Consider all of the times a commit changed the way data is stored, which spiked the cost to operate. Or how about the times when a commit changed how memory was used, which resulted in unexpected autoscaling?
Collaborative Cost Management
So the question becomes, what is the most efficient way to identify and root out additional savings while also balancing governance and agility? Bottom-up, collaborative cost management may offer a solution.
A bottom-up approach to cost management involves pushing knowledge and functionality to the edge of the application and data engineering teams that own and operate code every day. This does not mean ceding centralized cost control to developers but rather arming them with timely, actionable insights inside their everyday workflow and, where appropriate, enabling action on a self-serve basis.
How does this work in practice? It starts by enabling engineering teams to see the costs of the various services and environments they own on a timely basis. This helps teams understand when spikes occur and provides a path they can follow to understand why the spike occurred.
That path can involve bringing different pieces of information together for engineering teams. They’ll want to see the costs of the individual resources that power their service or environment during the selected timeframe to see what spiked and maybe even drill down further to see the cost drivers of that individual resource. They’ll likely want to see deployments and commits, at a minimum, and maybe even explore observability data as well during this timeframe.
Once they’ve identified a root cause, to the extent it involves needing to change a configuration setting of a resource, teams may want to understand a resource’s dependents, like other resources that are linked to it as well as any other environments and services that depend on it to size up the blast radius of a potential change. From there, they may wish to view config settings for the resource.
Collaborative Discussion
Finally, you want to give teams the ability to take certain actions on their own and, in other cases, foster a collaborative discussion between engineering and DevOps. You can accomplish the former via self-service actions, which are simple no-code workflows in which a developer can see if the DevOps team has allowed an action to be seen by a specific team for a resource via a role-based access control (RBAC) model. These workflows connect to your existing infrastructure-as-code (IaC) or CI/CD tools to orchestrate the desired change, leveraging golden paths defined by DevOps. This approach alleviates developers’ cognitive load and TicketOps for the DevOps team. For various changes that require coordination with DevOps, your engineering teams can have a collaborative discussion on a timely basis thanks to the fresh, actionable information you’ve made available to them.
So what does it take to actually implement this vision? It’s easier than you might realize.
Chances are, your existing cost optimization tool cannot provide cost visibility at the level of granularity and freshness developers need – daily cost insights at the service (or data pipeline, ETL job, etc.) level. That’s okay, though. Collaborative cost management is an incremental tactic that can still leverage some of the insights in these tools.
The easiest way to facilitate this competency across your enterprise is via an internal developer portal. Internal developer portals are living sociotechnical knowledge maps that connect various data from your tools, clouds and teams to your services and the teams that own them. They are the perfect conduit to slice cloud cost data at the service level and put it in front of application and data engineers in their everyday workflow.
Backstage, an open source framework to build a developer portal, offers a cloud cost plugin, though it involves a time commitment to set up, and the information it makes available to your developers may be limiting in the context of this scenario. For instance, your team may be unable to see per-resource costs, individual resource cost drivers, blast radius information and config settings. You must also expend effort wiring actions to support resource changes.
You could also try building a BI dashboard for your team to accomplish this. This would require ingesting daily cost feeds from your cloud and then allocating the cost of each resource to each environment and to each service leveraging proprietary resource<->environment<->service mappings you would build and maintain as your system evolves. You would also need to build scripts and leverage IaC and CI/CD tooling and teach your engineers how to discover, configure and execute these actions.
However you choose to get to this destination, collaborative cost management is a powerful emerging competency that can lead to meaningful savings and improve the developer experience by empowering your teams with the knowledge and functionality they need to manage costs on a proactive and, where appropriate, self-service basis.