More than 94 percent of organizations already have Agile or DevOps initiatives or plan to adopt them within the immediate future, according to a survey by VersionOne. As part of the process transformation required to adopt these more iterative and rapid methods of delivering software, organizations are reassessing all practices associated with software development and test, including load testing.

Traditionally, load testing has been deferred until the late phases of waterfall release cycles. Specialized performance testers applied legacy load testing tools to validate application performance prior to release. Why is this no longer sufficient?

With teams moving to more componentized applications (including cloud-native apps, microservices, etc.), an application involves many highly distributed components—and a performance issue in any of them could have a ripple effect across the entire application.
Now that new functionality is being released weekly, daily or hourly, each team needs instant insight into whether their incremental changes negatively impact performance.
The later you start load testing, the more difficult, time-consuming and costly it is to expose, debug and resolve performance problems.

In-depth load testing by performance-testing specialists remains critical, but it doesn’t provide the level of fast, on-demand load testing that’s critical for Agile and DevOps. Developers and testers need a way to expose critical performance issues before new functionality progresses through the delivery pipeline. To achieve this, DevTest team members must be able to easily:

Create load tests that provide fast feedback on the functionality they’re evolving; and
Execute and scale those load tests as needed—without the exorbitant costs and efforts traditionally required to establish, configure and maintain a performance test lab.

Traditional Load Testing Isn’t for Everyone

However, load testing has long been the domain of performance testing specialists for a reason: it’s difficult. Today’s developers and testers don’t have the time (or desire) to wrestle with all the technical details required to get load tests working correctly and to keep brittle load tests in sync with the rapidly evolving application.

The traditional way of approaching load test scripting is at the protocol level (e.g., HTTP). This includes load testing with open source tools such as JMeter and Gatling, as well as legacy tools including LoadRunner. Although simulating load at the protocol level has the advantage of being able to generate large concurrent load from a single resource, that power comes at a cost. The learning curve is steep and the complexity is easily underestimated.

The main culprit for this complexity is JavaScript. In 2011, there was usually less than 100 KB of JavaScript per page, which spurred around 50 or fewer HTTP requests. Now, that’s doubled: We see on average 200 KB of JavaScript per page, and this gives us more than 100 requests per page.

Just running a search on a simple search page involves things such as XML HTTP requests processed asynchronously after page load. You also find things such as dynamic parsing and execution of JavaScript, the browser cache being seeded with static assets and calls to content delivery networks.

For a more business-focused example, consider the SAP Fiori demo app. Assume we want to load test two simple actions: navigating to a page and then clicking on the “My Inbox” icon. This actually generates more than 120 HTTP requests at the protocol level.

When you start building your load test simulation model, this will quickly translate into thousands of protocol-level requests that you need to faithfully record and then manipulate into a working script. You must review the request and response data, perform some cleanup and extract relevant information to realistically simulate user interactions at a business level. You can’t just think like a user; you also must think like the browser.

You need to consider all the other functions that the browser is automatically handling for you, and figure out how you’re going to compensate for that in your load test script. Session handling, cookie header management, authentication, caching, dynamic script parsing and execution, taking information from a response and using it in future requests … all of this needs to be handled by your workload model and script if you want to successfully generate realistic load. Basically, you become responsible for doing whatever is needed to fill the gap between the technical and business level. This requires both time and technical specialization.

You might be thinking, “Okay, we’ll use ‘record and playback’ tools, then.” Theoretically, you could just place a proxy between your browser and the server, record all the traffic going through and be set. Unfortunately, it’s not quite that simple. Even though you’re interacting at the UI level, the testing is still based on the protocol level. Assume we were looking at the traffic associated with one user performing the simple “click the inbox” action described above. When we record the same action for the same user two different times, there are tens if not hundreds of differences in the request payload that we’d need to account for.

Of course, you can resolve those differences with some effort. Unfortunately, when the application changes again, you’re back to square one. The more frequently your application changes, the more painful and frustrating this becomes.

Taking a Browser-Based Approach

To sum up the challenge here: modern web applications are increasingly difficult to simulate at the protocol level. This raises the question: Why not shift from the protocol level to the browser level—especially if the user’s experience via the browser is what you ultimately want to measure and improve?

When you’re working at the browser level, one business action translates to maybe two automation commands in a browser as compared to tens, if not hundreds, of requests at the protocol level. Browser-level functions such as cache, cookie and authentication/session management work without intervention. There are a number of ways to simulate traffic at the browser-level: Selenium is clearly the most popular, but there are a number of cross-browser tools available—some of which let you test without getting into scripting.

However, historically, it just wasn’t feasible to run these tools at the scale needed for load testing. In 2011, if you wanted to launch 50,000 browsers with Selenium, you would have needed something on the order of 25,000 servers to provide the infrastructure. Moreover, it would have been prohibitively expensive and time-consuming to provision the necessary infrastructure.

Today, with the prominent availability of cloud-based technology, the concept of browser-based load testing is feasible. At the same time, projects Google Chrome and other projects are offering fast automation and better memory profiles with headless or UI-less variants of the browser. In fact, tests with headless Chrome show that it’s possible to go far beyond the industry standard benchmark of two to five browsers per machine and achieve around 50 browsers per machine.

Suddenly, generating a load of 50,0000 browsers is a lot more achievable—especially when the cloud can now give you access to thousands of load generators that can be launched from any web browser and can be up and running in minutes. Instead of having to wait for expensive performance test labs to get approved and set up, you can get going instantly at an infrastructure cost of just cents per hour. Fast feedback on performance is no longer just a pipe dream.

At Flood IO, we’ve captured our browser-level user (BLU) research and development in Flood Chrome: a browser-based load generation approach that builds upon APIs from Google for headless Chrome automation. This approach reduces the complexity of writing and maintaining tests while maintaining a viable level of concurrency and performance per machine. It creates scripts at a higher level of abstraction—the user level—then distributes those tests via the cloud for better economies of scale.

Why consider this new approach to load testing?

Simple scripting—or no scripting at all
Reduced test complexity
Test entire stack in one shot from the user perspective
Capable of testing any user behavior
Record network and user interaction times for front-end optimization
Easier to test earlier and often
Easier to maintain
10X faster than other Selenium load testing

Or, in just one line: because it’s purpose-built for DevOps.

There’s Still a Time and a Place for Protocol-Level Load Testing

Of course, no single load testing approach is not going to solve everyone’s load testing needs all the time. For example, if you’re trying to test an application that’s not accessible from a browser, the BLU approach isn’t going to work for you.

Moreover, there are still some situations where you’d be remiss to overlook protocol-level testing with tools like JMeter, Gatling, or API-based test cases. For example, if you want to simulate load against APIs, I’d still recommend running tests that exercise them directly at the protocol level.

Ultimately, though, it’s important to remember that protocol-level tests can be higher maintenance and use them accordingly. If you have a single click that makes 20 background requests, would you rather wrestle with all the technicalities of scripting that at the protocol level, or have one line of a BLU script that achieves the same business functionality?

Final Thoughts

By reducing the complexity traditionally associated with load testing, BLU load testing gives developers and testers a fast, feasible way to get immediate feedback on how code changes impact performance. It’s geared to help people who are not professional performance testers quickly get started with load testing and create load tests that can be run continuously within a CI/CD process—with minimal maintenance. With this new “lean” approach to load testing, any developer or tester can get started with load testing.

— Tim Koopmans