Measuring What Matters in Evaluating Software Performance

When developing and deploying a new feature or application, performance (speed, reliability) must be closely analyzed. After all, there’s no benefit to delivering a product if it doesn’t work well or users are forced to stare at a spinning wheel. To get a true view of performance, many organizations rely on metrics—a smart move since metrics, much like the scale, do not lie. Right?

Unfortunately, that’s not always the case. Even when metrics are used, there often are different interpretations of performance, as well as debate over which metrics should be used in the first place. Metrics can be biased and manipulated, but does that mean we should stop using them? Of course not. Instead, we must learn to weed out bias and metric manipulation.

What is Bias?

Bias is not a bad thing; everybody is biased to some degree. Bias is defined as a particular tendency, inclination, feeling or opinion, especially one that is preconceived.

Two common biases are anchoring and confirmation bias. Anchoring is the tendency to rely on a single piece of information, usually the first piece of information we encounter. The statistics presented first will most likely be remembered and drive decisions, responses and reactions.

Imagine you’re about to launch a new application, but before doing so you run some performance tests to see how fast the application loads. The initial tests show the load time is 10 seconds. That’s too high, so the team looks for ways to improve performance. A few weeks later response times drop to 6 seconds, a 40 percent improvement. That’s great, and everybody celebrates. But what if initial tests showed page load times of 4 seconds and a few months after launch times crept up 40 percent to 5.6 seconds. Would you still be celebrating? No, you would be looking at what went wrong and trying to fix it. The initial anchor influences whether you see a time as positive or negative. 5.6 seconds is better than 6 but the initial anchor makes 5.6 seem worse.

Confirmation bias is the tendency to interpret and search for information that confirms a pre-existing hypothesis. If you believe a new feature does not add value to an application, you may be more likely to focus on negative statistics rather than positive ones. If testing shows application load times will increase 15 percent with the new feature, you may use that as proof to delay or stop the release of the feature. If, however, you think the feature is a real game-changer, you will find the research about the 20 percent rule, which says people perceive changes in speed when the difference is at least 20 percent faster or slower. A change in 15 percent won’t be perceived by users. At the core, confirmation bias is about ego and not wanting to be wrong.

The first step to avoiding biases is acknowledging they exist and everybody—yes even you—is susceptible. But there are ways to avoid them, or at least minimize their impact. Create your own anchor. Do research to determine what is an acceptable response time and use that as the anchor point. Search for facts that support and oppose your viewpoint, and don’t stop with a single site or fact; find multiple sources. Foster an environment where disagreement is encouraged in a positive and constructive manner, and listen to the dissenting viewpoints. Ask better questions and be willing to accept the answers even if they aren’t what you were seeking.

Avoiding Metric Manipulation

Metrics should be actionable and tied to business goals as opposed to being purely “vanity metrics”—those metrics that may make everybody feel good, but don’t reflect the true user experience and the consequent impact on the business.

Specific to measuring web performance, there are countless metrics that can be selected to determine whether a user experience is positive or negative. If you choose to measure the time to the onload event as the core metric, you can recode a page to defer items to after the onload event. The metric will go down, teams will celebrate, but did this have any impact on the user experience? Was the goal really to reduce onload time or was it to better satisfy end users through a faster site in their eyes, increasing conversions and time spent on site?

Here is a quick checklist you can use to choose the right user performance metrics:

Do improvements or declines in this metric impact what really matters: the user perception and business performance? Can this correlation be clearly made? How will this metric show progress towards the businesses goals and objectives?
Can the metric be gamed or influenced? Once something is being measured, people may look for ways to game the system and show improvement when in actuality there is none.
Is the metric “good enough”—i.e., the best one available, although it may not be perfect? Though not a performance-related example, popular website Medium’s choice of top-line metric illustrates this approach. While most websites rely on unique site visitor numbers as the ultimate gauge for their success, Medium, which views high user engagement as their key success factor, instead measures the time users spend reading articles on the site. This may not reflect the totality of user engagement (visitors who come to the site, but may just scan, not read) but it’s good enough and a more realistic indicator than alternatives.

Whether we like it or not, people are heavily guided by bias and emotion, and they often manipulate reality (even subconsciously) to support their own motivations. Performance metrics are and always will be central to evaluating software performance, but it’s not enough to just lay out the facts. Organizations must frame and present metrics appropriately and sufficiently, and focus on those that are the most relevant and reliable. These are the keys to delivering the most complete, impartial canvas of performance information, helping guide the best decisions in the interest of the business.

Tweets by dparzych

About the Author / Dawn Parzych

Dawn Parzych is Director of Product and Solution Marketing at Catchpoint. She enjoys researching, writing and speaking about trends related to application performance, user perception, and how they impact the digital experience. In 15+ year career, Dawn has held a wide variety of roles in the application performance space at Instart Logic, F5 Networks and Gomez. Connect with her on LinkedIn.