The Power of Autoscaling and the Mistake You’ll Never Make Again

Many years ago when I was interning as a software engineer, I made a mistake that could have ended my career just as it was getting started …

It was my birthday, and I was given the task of loading queries to a database. The task was straightforward—even then, when I was only an intern learning my craft. I was using a program to upload the queries automatically. It looked like the program was running well, so I decided to go home and leave the program running overnight.

The next day at the office I met any intern’s worst nightmare: Everything had stopped. And it was all my fault. The queries were loading with a paralyzing level of inefficiency. In fact, it took up all of the processing power that we had in that company server, making it crash. All operations were brought to screeching halt … just for my measly query project.

It took several hours for the operations team to have everything running again. Luckily, I didn’t lose my job. In fact, the company ended up handling the situation graciously, and it turned out to be a good learning opportunity for me.

These days, something like this is entirely avoidable, and it’s all thanks to the magic of autoscaling. As recently as 10 years ago, obtaining more server space was a slow and rigid process owned by a few gatekeepers. Now anyone can acquire more server space with just a few clicks.

The democratization of processing servers has completely transformed the internet. It has given any organization virtually unlimited processing power at a moment’s notice. When done strategically, autoscaling allows for a more flexible and agile processing infrastructure. If implemented in a way that’s ad hoc or rushed, your organization will end up spending exorbitant sums of money without seeing the full potential of autoscaling.

Here are three questions you need to answer to develop a winning autoscaling deployment strategy.

What Should You Autoscale?

Before the advent of autoscaling, projecting server needs took a lot of guesswork. Imagine, for example, that a timely and effective advertisement suddenly brings a 1000x spike in traffic to a company that sells sunglasses. If the company is not ready for this spike in traffic, everything comes to a screeching halt. Orders are not filled—and some orders can’t even be made because the server space is not ready to handle the unexpected surge in volume.

With autoscaling, processing power can be offloaded to a new server automatically, based on the conditions set by the system’s administrators. This is valuable not only in the case of spikes in traffic, but also for lag periods. This is a way to avoid paying for processing power that you’re not using.

In any organization’s computing architecture, there are a number of components that can be autoscaled, whether it’s central processing units (CPU), random-access memory (RAM) or bandwidth. The first step is to collect as much data as possible to understand the current operational loads of each of these items. Any organization using Amazon Web Services (AWS) has access to some data monitoring services, but they tend to be very basic. It’s better than nothing, but there are other options out there.

One option is black box testing, which allows you to measure various components within your infrastructure through monitoring. Among many things, it can help you determine where bottlenecks are occurring. Elastisearch is another great resource for word searching and metadata collection within a document or a digital framework.

Another effective tool to assess autoscaling needs is something called an ‘agent.’ It works by sending pulses to measure processing data throughout an entire system. Collecting data like this will give you crucial insight on the state of your processing infrastructure.

Once you collect all of this data, a thorough analysis is in order. At present, analyzing this data must be done manually, but before long machine learning will take care of this part of the process. It’s important that you consider all layers of your applications (front end, back end, database) to autoscale effectively. It can be tempting to scale at will, given how easily it can be done. Be vigilant against abusing this capability. Otherwise, it won’t be long before you spend outrageous sums of money on something you don’t need.

How Should You Autoscale?

An important concept to understand before you decide on your autoscaling approach is containers. Containers are small pieces of code that are meant to perform a specific function. In this way, a single machine can contain several apps used to perform a variety of functions: CPU, RAM, bandwidth, disk space, etc.

Containers are a sophisticated way to get creative with the server space you’re working with, and should be considered even before you start building on a vertical or horizontal autoscaling strategy. Better algorithms in existing hardware can help you to avoid making new hardware purchases. In any case, once you do decide to autoscale, containers should most definitely be considered as you strategize and build.

However, while containers can be great for scaling, it should be taken into account that they have the potential to add a great deal of complexity to your server. For example, if your server can run 20 containers, but suddenly you need three more, buying a new server with more containers is an expensive solution, since the server would be mostly empty.

In principle, there are two ways to autoscale: vertically or horizontally. Scaling vertically means adding more physical machines with more gigabytes for more processing power. The biggest AWS machine server that’s currently available (at least as far as I’m aware) is 128 CPUs. That’s the most processing growth you can make from just adding one server. The cost of a new machine like this will vary: anywhere from 80 cents to $7 an hour is what you can expect to pay.

The other approach is to scale horizontally, which would involve a cloud-based solution. For organizations not ready to make the commitment that comes with adding a new physical server, this is the approach worth considering. Plus, with a horizontal approach to autoscaling, it’s possible to adjust processing needs on the fly.

It’s important to understand that not all parts of an application can scale the same way. Databases, for example, can scale up with ease. But scaling in (horizontally) would take a lot of time. If your data is stored in 30 front-end containers and you want to scale in to 15 containers, you can’t do it in one step. You’d first need to scale from 30 containers to 28, wait for the data to replicate, then scale in from 28 containers to 26, again wait for the data to replicate, then from 26 containers to 24, etc. Otherwise you risk losing all of that data.

When Should You Autoscale?

Consider the above example of the sunglasses vendor. While it’s impossible to predict exactly when your business or customer base will reach a tipping point, something such as publishing out an advertisement with a broad reach is certainly an instance where your organization ought to be ready for a sudden spike in traffic. Your business decisions should be made mindfully, so that the potential effects of actions like these—and the strain they create on your organizational architecture—are understood ahead of time.

Anticipating peaks and troughs in demand is something businesses have been doing for centuries. In Colombia, for example, utility companies have long understood the power of “arepa hour”: This is when Colombians are at home eating breakfast (specifically arepas, a traditional breakfast food in Colombia). Knowing that the majority of the country is likely to be in their homes sometime between 7:30 and 9:30 a.m. consuming power, that is a crucial data point that can allow for utility companies to be ready for this disproportionately high level of demand.

The dilemma to determine “the arepa hour” of your business’s autoscaling needs often comes down to CPU and memory. Should you be ready to autoscale based on when your site crosses a predetermined CPU load? Or should it be based on the memory running on your servers?

The answer is really neither. There is no “golden metric” that will point the way for your autoscaling strategy, and you should never decide to scale based on a single data point in isolation. For example, it may make sense for processing data to be offloaded to a new server once the processing capacity is at 90 percent, but it’s also entirely plausible that your application continues to run completely normally with zero issues at this processing load.

You should be aiming to strike the right balance between efficiency and not compromising the user experience for those who visit your site or use your application. And the only way to do this is to consider all of your data points holistically and how they interact with one another—not in isolation.

Conclusion

We are past the time in which a query loading faux pas like the one I made as an intern can completely shut down a company’s processing activities. Given how accessible processing servers have become, some may find it tempting to “brute force” their way through every problem: Make a few clicks and pay for more server space at will. Don’t be goaded into taking the easy way out. Think about autoscaling strategically by answering these deceptively basic questions, and you will in fact make it easier (and more cost efficient) for your organization in the long run.

— Alejandro Calderon