By allowing teams to maintain two production-ready environments at the same time, the blue-green deployment technique can significantly boost reliability. But blue-green deployment can also be difficult to execute and manage. Let’s unpack how blue-green deployment works, why it’s important and which best practices to follow for blue-green deployment success.
Blue-Green Deployment, Defined
Blue-green deployment is a software deployment technique in which teams maintain two identical but distinct environments that are capable of hosting a production application. One environment is “live” and actively serving users, while the other is kept in reserve.
When it comes time to deploy a new version of your application, you deploy it to the reserve environment. You then test and validate it to make sure it meets performance and reliability requirements.
If it does, you reconfigure your DNS or load balancers so that user traffic is directed to this environment, which becomes the new production environment. Meanwhile, the previous production environment is placed in reserve.
Benefits of Blue-Green Deployment
The blue-green deployment technique can boost reliability in several ways:
- Environment parity: Because the reserve environment in which you test your application becomes the live production environment, blue-green deployment ensures parity between testing and production. In turn, it reduces the risk that configuration differences could cause unforeseen problems in production.
- Failed deployments: If a deployment fails, it doesn’t impact your users because you deploy into the reserve environment. You can fix the issue while a stable user-facing version of your app continues to run in the other environment.
- Redundant environments: Although blue-green deployment is not a data protection or disaster recovery technique, having two production-ready environments available is useful in the event that one of your environments is wiped out by an event like a data center failure. In that case, you can restore operations quickly by using the other environment (provided it’s hosted in a separate data center, of course).
How to Use Blue-Green Deployment: Steps and Best Practices
While blue-green deployment is a great way to increase reliability, it does present some challenges. The following best practices can help you to handle them.
Use Scalable Infrastructure
One obvious potential downside of blue-green deployment is that it can essentially double the amount of resources you need to host your application since you have to maintain two identical environments. That can double your hosting bill.
One way to reduce the cost impact of blue-green deployment is to create environments that can scale rapidly without changing their core configuration. You could do this using cloud-based virtual machines (VMs) and autoscaling or via containers and an orchestration engine (like Kubernetes) that can scale them.
Using this approach, you can set up two environments that are effectively identical except that the production environment includes more VM or container instances than the reserve. Although you could argue that this is not quite a blue-green deployment because the environments are not perfectly identical, that semantic downside is outweighed by the benefit of not wasting a bunch of money on infrastructure you don’t need.
Chaos Engineering
Typically, chaos engineering entails experimenting with a production system to test its reliability. Chaos engineering offers many benefits, but it presents the obvious risk of creating a problem in production that impacts your end users.
Blue-green deployment allows you to work around this risk by practicing chaos engineering in the reserve environment. Of course, because you won’t have real users, performing chaos engineering under these settings may not allow you to test every aspect of your production application. But you can still test many of the configuration variables by issuing requests yourself within the reserve environment—all without worrying about breaking things for actual users.
Managing Database State
Keeping databases in sync when you switch between environments is one of the biggest challenges of blue-green deployment. Even small discrepancies in database state could cause big reliability issues.
The simplest way to address this issue is to turn off the database before switching and perform tests to ensure that databases in both environments are identical before they are reenabled. Of course, this approach may impact application functionality, because the app may not work properly if the database is turned off.
Another solution is to sync databases between your two environments. This should ensure that database state remains identical even as you switch from one production environment to the other. But it also means that problems with the database will affect both environments.
The bottom line: There is no simple way of managing database state in blue-green deployment. You should evaluate the different options available and consider which one best meets your needs. If you can tolerate a little downtime, stop the database and restart it during the switch. If you need total continuity, use syncing, but be sure to manage the associated risks.
Change Load Balancers, not DNS
There are two main ways to manage where traffic is directed, and hence which is the production environment: You can change your DNS configuration when you want to switch. That’s easy to do—you would simply change the mapping for yourapp.com from one IP address to another, depending on which environment is the live one.
You can also use load balancers to manage traffic. Load balancer configuration is more complex than simply changing DNS entries, but it also enables more sophisticated and granular control.
In general, it’s a best practice to manage blue-green deployment switches via load balancers, due to the extra control they provide.
Prepare for the Unexpected
While blue-green deployment can reduce the risk of user-impacting disruptions, it’s no silver bullet. You must also be sure that you have the reliability management tools you’ll need to detect and remediate issues within both of your environments.
Blue-green deployment is a powerful reliability engineering technique. But to get the most out of it, you must optimize the way you handle requirements like infrastructure management, traffic routing and database state.