Why Every DevOps Team Needs a Spot Instance Strategy

Most DevOps teams use the public cloud extensively and focus a lot of energy on reducing cloud costs. According to one estimate, U.S. businesses spent $14.1 billion in 2019 just on wasted, unused cloud resources. Spot instances are one of the most important ways to reduce the cost of public cloud services.

Spot instances are not a new development. Amazon was the first to announce this pricing option in 2009, turning cloud computing into a market influenced by the dynamics of supply and demand. Microsoft Azure took more than a decade to follow suit, announcing its spot virtual machines offering in May 2020.

The spot pricing model is simple on the surface but can be complex to implement. Spot instances are how cloud providers sell their unused capacity. When compute instances are not ordered by anyone via the regular, on-demand pricing model, the cloud provider puts them up for auction at very favorable prices, usually around 10% to 20% less than the on-demand cost.

However, it’s not so simple to get this 80% to 90% discount. When another cloud customer requests the spot instance, the cloud provider sends a notification, and you need to immediately move your workloads before the instance is terminated. This makes it difficult to use spot instances for stateful, database-driven applications, or those that require high availability.

This is not the only challenge. The market price fluctuates and is much more volatile than on-demand pricing. This means the level of discount that can be achieved over time is relatively unpredictable.

Sponsorships Available

However, there are at least three ways savvy DevOps teams can make great use of spot instances:

Automation—DevOps teams are proficient with a range of automation tools, including configuration management, infrastructure as code (IaC) and cloud provider auto-scaling tools. These can be used to manage workloads in clusters and automatically fail over when a spot instance terminates.
Dev/test environments—DevOps teams are responsible for setting up development and testing environments, which are extremely suitable for spot instances because they can usually tolerate brief interruptions, and, in many cases, are not stateful.
CI/CD jobs—running jobs on Jenkins, GitLab and similar tools can be easy to scale on spot instances. These jobs are stateless, and if an instance drops, it’s easy to rerun the job on another.

AWS Spot Instances

AWS Spot Instances let you buy unused Amazon EC2 computing power at a significantly discounted price. You can specify a price, and when a spot instance is offered at that price, it is launched with the Amazon Machine Image (AMI) of your choice.

How Spot Instances Work on AWS

Spot instances are priced at a variable spot price, which is adjusted according to supply and demand conditions. To see current rates, use the AWS Spot Instance Advisor.

You create a spot instance request, specifying what instance types you are interested in, and the availability zones (AZ) in which they should run. If capacity is available, and their current price is less than your maximum bid, instances are launched.

Spot instances continue running until:

Capacity is no longer available (because the instances were requested by on-demand customers).
The price has risen over your maximum bid.
You request to terminate the instance, or it is automatically terminated by auto-scaling.

You can also order spot instances with a predefined duration—you then pay a static hourly rate for that entire duration (even if market price changes in the interim).

Automation Options

You can automatically scale instances on Amazon EC2, including both on-demand instances and spot instances in a single auto-scaling group. If spot instances are not available when scaling up, the group can use regular on-demand instances.

Amazon also supports mixing in reserved instances and savings plans (additional ways to save on on-demand instances by committing to a certain period of time or total capacity). So you can combine multiple saving methods in the same auto-scaling group.

You can improve availability by deploying applications across multiple instance types running in multiple AZs. By allowing multiple instance types, you tap into multiple pools and increase the chances of obtaining a spot instance when you need it.

Azure Spot Instances (Spot VMs)

Azure offers Spot VMs that give you access to unused compute capacity. You can request a single spot VM, or launch multiple spot VMs using an Azure VM Scale Set (VMSS). Spot VMs replaced the previous Low Priority VMs feature, which let you purchase VMs that were in low demand on Azure for a reduced price.

The spot price of VMs on Azure depends on the total capacity available for that specific instance size and SKU (instance type) in the Azure region. Azure commits to changing pricing slowly—avoiding sudden spikes—to maintain pricing stability and make it easier to manage budgets.

Like on Amazon, discounts fluctuate significantly, and spot VMs can be up to 90% cheaper than the base price of the same VM.

How Spot VMs Work on Azure

The Azure Portal provides access to Azure spot VMs. When you create a spot VM, you can see the current price for the selected region, image and VM size. For consistency, prices are always in U.S. dollars, even if you use a different base currency for billing.

There are two options for eviction of spot VMs—you can choose the condition on which spot VMs will be evicted:

Maximum price eviction—You set a maximum bidding price, and when the spot VM rises over that price, it is evicted.
Capacity eviction—You always pay the current price of the VM (without setting a maximum price), and when Azure does not have sufficient capacity of the requested VM type, your VM is evicted.

When a VM is evicted, Azure applies an eviction policy called Stop / Deallocate. This means the instance is paused, but attached disks remain, and you are still charged for them. When the price goes down or capacity becomes available, the instance is restarted and continues working on the same disk data.

Automation Options

Azure provides virtual machine scale sets (VMSS), which can automatically increase or decrease the number of VMs running your application. You can create a scale set that includes spot VMs, and as your application scales, more spot VMs will be added as they become available. Spot scale sets operate in a single fault domain and do not guarantee high availability. Unlike AWS, Azure currently does not allow you to mix on-demand VMs and spot VMs.

Both Amazon and Azure provide robust capabilities for cost savings using spot instances. Azure’s offering is newer and provides less-advanced bidding and auto-scaling capabilities, but these are expected to be added as the service matures.

Whether DevOps teams choose to run in AWS, Azure, or both, they literally cannot afford to ignore spot instances, especially for low-criticality workloads like dev/test and CI/CD job execution.