Blogs

Four Steps to Avoiding a Cloud Cost Incident

The recent Flexera 2022 State of the Cloud Report found that organizations waste 32% of their cloud spend, up from 3o% last year. This can be due to cloud cost incidents triggered by unused resources, malicious activity or overambitious projects and which have a massive financial impact if not found and corrected promptly.

As an organization’s cloud infrastructure footprint matures, mapping cloud resource use and costs accurately to product lines, teams and business units becomes progressively more complicated and problematic. However, accurate cloud cost attribution is vital to understanding team resource use and determining accurate cost of goods sold (COGS) margins and is imperative to budgeting and forecasting. Organizations need reliable, granular views of the cloud environment to attribute costs back to departments and optimize a meaningful budget.

Cloud cost spikes can blow up an organization’s budget fast and can be hard to track down, but enterprises can help protect themselves from a cloud cost incident with just a few simple steps.

Have a Strong Tagging Strategy That Attributes Cost Granularly

The cloud operations team needs to identify what is important to business reporting, which will help define and create a strategy for tagging. For instance, when an organization has two teams managing a particular service, the teams should safeguard the services tagged with a team identifier. Also, the teams should add another tag for the environment in which they’re running so they can allocate cloud costs across teams and environments.

When an application uses numerous resources, like a database and storage, teams should consider adding a specific application tag and break down costs by each application. It should be noted that there is no one-size-fits-all tagging solution, so enterprises will need to determine what works and try out different strategies by adding and removing tags later.

Understand Cloud Cost Ownership to Avoid Chaos

Ask yourself how well your organization has attributed cloud costs to teams, applications or COGS. If the answer is “Not well,” then you are missing vital information about how your cloud costs relate to your business.

The lack of this context or visibility into resource allocation makes it unmanageable and impossible for cloud operations owners and senior leaders to agree on who is responsible for varied expenses and resources. Manual cost attribution doesn’t scale accurately or account for shared cluster resource use across teams and applications. This creates confusion that can escalate when users arbitrarily slice data and claim ownership without accountability or clear visibility.

What happens is that costs can sometimes be double-counted with two owners—or even missed completely with no owner whatsoever. The resulting data emerges as an inferior quality that is untrustworthy and not actionable. The inability to visualize resource use in context can dilute, hinder and even defeat cost improvement initiatives and optimization.

Cloud costs should have an owner who can ensure that cloud spending is tied to business value. Understanding who owns which pieces will help pinpoint the cause.

Establish and Monitor Cloud Budgets

Historical data on cloud spending is gold—it can help forecast and budget for what should be happening with the cloud’s cost footprint. If that’s not happening, then it should be flagged and investigated.

Once you get cost data under control, organizations can get an accurate picture of where cloud spending is taking place. However, does this match the expectations of spending? By establishing and monitoring budgets at every stage of a project, organizations can have guardrails to cloud spending and prevent costs from soaring out of control when engineers overallocate on performance “just in case.” Once budgets are in place, organizations can monitor spending and right-size resources to balance actual usage, rather than estimated usage at the start of execution.

Over time, right-sizing resources will allow for optimal spend, as well as identify unused resources that can be shut down. Organizations should identify development and test instances left running when no longer needed and shut them off.

Implement Cost Anomaly Detection

Organizations can shield themselves from cost anomalies by continually monitoring and optimizing cloud costs. Cost anomaly detection can help act as a barrier to early recognition before they become problematic. This is especially effective when prompt, actionable alerts can be delivered directly to resource owners via Slack or email when their forecast exceeds the set budget. The best option would be adopting a cloud cost monitoring and optimization (CCMO) tool that gives alerts in real-time when a cost anomaly occurs. These tools must be able to detect anomalies and analyze their root cause, while also allowing visibility into what went wrong to avert similar future errors.

Executing continuous cost monitoring and anomaly detection tools for dynamic cloud environments is essential in identifying any activity that doesn’t align with expected expenditure or diverges from an established pattern. With proper monitoring, organizations can take corrective measures before the fact.

Unwanted and unexpected cost incidents can clean out an enterprise’s budget quickly, especially as its cloud infrastructure grows. Organizations need to practice these steps to gain a full picture view of their cloud environment.

Asim Razzaq

Asim Razzaq is the co-founder and CEO of Yotascale. Prior to Yotascale, Asim was Senior Director of Platform Engineering (Head of Infrastructure) at PayPal where he was responsible for all core infrastructure processing payments and logins. He led the build-out of the PayPal private cloud and the PayPal developer platform generating multi-billion dollars in payments volume. Asim has held engineering leadership roles at early-mid stage startups and large companies including eBay and PayPal. His teams have focused on building cloud-scale platforms and applications. Asim earned his BS (Honors) in Computer Sciences from the University of Texas at Austin where conducted undergraduate research in the areas of resource management and distributed computing and is a published author.

Recent Posts

GitLab Adds AI Chat Interface to Increase DevOps Productivity

GitLab Duo Chat is a natural language interface which helps generate code, create tests and access code summarizations.

5 hours ago

The Role of AI in Securing Software and Data Supply Chains

Expect attacks on the open source software supply chain to accelerate, with attackers automating attacks in common open source software…

10 hours ago

Exploring Low/No-Code Platforms, GenAI, Copilots and Code Generators

The emergence of low/no-code platforms is challenging traditional notions of coding expertise. Gone are the days when coding was an…

1 day ago

Datadog DevSecOps Report Shines Spotlight on Java Security Issues

Datadog today published a State of DevSecOps report that finds 90% of Java services running in a production environment are…

2 days ago

OpenSSF warns of Open Source Social Engineering Threats

Linux dodged a bullet. If the XZ exploit had gone undiscovered for only a few more weeks, millions of Linux…

2 days ago

Auto Reply

We're going to send email messages that say, "Hope this finds you in a well" and see if anybody notices.

2 days ago