Maintaining Progressive Delivery Quality With Feature Flags

How does Netflix know exactly what you want to watch? How do social networks ensure major software updates are smoothly deployed across millions of users? The answer, in part, lies in progressive delivery. Progressive delivery, the incremental introduction of new features, has become key to rolling out new features. By testing unproven features on a small subset of users, platforms can advance quickly while keeping rapport with the main userbase. But, where does QA fit into the progressive delivery cycle?

As software development teams face the need to iterate rapidly, the role of QA engineers is evolving as well. QA teams are increasingly assigned to support new customer experience initiatives and meet developer consumer needs. They must do so while understanding which customer metrics to monitor as developers introduce new features.

I recently chatted with Jeff Sing, lead software QA engineer at Optimizely, recently acquired by Episerver, to explore how feature flags could be used to maintain quality and respond to user feedback throughout a progressive delivery life cycle.

Defining Progressive Delivery and Feature Flags

Progressive delivery is is a software release style that emphasizes very early code pushes and rapid feedback. By testing features among small subsets of a greater user base, developers can quickly gauge what works and what doesn’t.

Feature flags are a necessary component to make progressive delivery work. Also called feature toggles, feature flags are switches that enable developers to turn a feature off or on, helping engineering teams test unfinished features or alpha features while avoiding maintaining multiple branches. “Feature Flags are a powerful technique, allowing teams to modify system behavior without changing code,” wrote Pete Hodgson in a post on MartinFowler.com.

Feature Flags Enable Experimentation

Adopting feature flags for progressive delivery allows for a lot of experimentation, said Sing. For example, companies could test a new user interface adjustment on a small subset of users. If it works well, the team could further and optimize it for specific user tastes.

“Why would you want to build something that no one wants to use? How do you know you’re building the right thing?” asked Sing.

Iterative release strategies are nothing new—and they go by other names: canary release, A/B testing, targeted rollouts and feature gating. However, what seems unique about feature flags is the ability to optimize granularly.

For example, if you compare a friend’s Amazon home page with your own, you will likely find the dashboard, product recommendations and overall UI vastly different. These companies are continually “running active experiments on a per-user basis,” said Sing. This enables them to personalize experiences to niche user tastes, in real-time.

Many other companies are utilizing progressive delivery tactics. Blue Apron, for example, adopts experimentation to find and refine business value. Experimentation is core to how HelloFresh distills its customer experiences. Compass also uses progressive delivery tooling to enable feature flagging within a suite of apps serving various platforms.

For QA teams, this means shifting right—from business testing to nano-testing at the pace of constant delivery. Thus, QA teams must have an active role—“continually test customer feedback and continually reiterate on that feature,” said Sing.

Implementing Feature Flags: QA Shifts Right

By adding customer feedback into the continuous delivery loop, feature flags aid incremental deployment strategies and become essential for DevOps. So how do we implement feature flags in our application development processes?

If we’re testing a single function within a small product, an obvious idea would be to comment out or uncomment code. But, that doesn’t scale well. Developers could, instead, program a conditional variable. However, there is still more dynamic feature-toggling options. More advanced solutions may involve a toggle router and/or a UI-driven alternative.

Sing described a system using switches written as YAML files and adopting a visual dashboard to toggle on/off features. This setup can enable a process whereby the developer checks in code and passes it to site reliability engineers (SREs) or QA managers. They can then initiate testing in production, view data from beta users in real-time and respond accordingly.

Integrating with a CI/CD pipeline would be the next logical step. Each build could carry many potential features within each build. If a flag in the code is on, the system would execute all code and components can easily be turned off or on remotely. This way, engineers “can control how things are represented in real-time without a code deploy,” said Sing.

Melding feature flags and progressive delivery returns extremely valuable data, Sing said. If teams can see metrics such as system behavior, bounce rate, customer NPS score and business conversions, they can better increase the quality of future deployments. “You can’t argue with hard data,” he noted.

Benefits of Feature Flags

Utilizing feature flags within continuous development appears to afford many benefits. In the process, QA plays a more present role in how engineering teams roll out (or roll back) new features. “QA must understand the business impact and customer story,” said Sing. “It’s not just about uptime; it’s the experience, too.”

As you can see from the use cases above, progressive delivery with feature flags brings many benefits:

Quick validation: Bring new product features and proof of concepts faster to market faster by improving testing.
Experiment and iterate: Release incremental software versions. Experiment with new or unfinished features that consider fringe cases.
Development: Avoid constant merging of branches.
Reflect customer journey: Gather more heuristics on the customer journey to test and measure customer data.
Optimize per-user: Test and optimize for user tastes. “Why would you want to build something that no one wants to use?” For Sing, targeted rollouts can answer that user imperative immediately.
Limit blast radius: If new features introduce outages, easy togglable capabilities mean they can quickly be turned off and put back into development, thus increasing service reliability.

Drawbacks of Feature Flags

Feature flags introduce many efficient methods for testing new features, but they don’t come without side effects in other areas. One potential drawback of feature flags is supporting a vast number of intricately defined user scenarios.

With each segmentation comes new optimization requirements and support must adapt to a quickly growing archive of various feature sets. Companies need advanced tooling to support the easy addition or removal of features. “You need the power to understand what’s going out and how you manage it,” Sing noted.

A fractured ecosystem with thousands of microservices can become difficult to support continuously. Engineering teams must draw the line somewhere between endless optimization and retaining a cohesive product. Ideally, developers can reuse base components but represent them in different ways.

Another negative side effect is public opinion. With the release of Social Dilemma and related tech industry critique, many user-facing tech networks are under harsh criticism due to the unethical use of user data to develop hyper-targeted ad campaigns. Optimization tactics that disrespect users should be frowned upon.

This is not to mention regulations around user data and privacy. If users do not grant consent to share user data to “improve the user experience,” engineering teams won’t have the necessary metrics and tracking data required to decide which features to implement.

While feature flags appear to help optimize business value, there are many potential “gotchas” to carefully consider before implementation, from technical logistics to service fragmentation, user ethics and legal ramifications.

Final Thoughts

“As with any new technology, if you don’t use it correctly, it’s almost better if you don’t use it,” said Sing. If progressive delivery isn’t managed correctly, with a QA mindset, it could come back to bite them.

As we continually build new tech to ship faster and smarter, it’s also important to consider customer needs. “Automation is great, but a computer can’t understand what a customer really wants,” Sing noted. To avoid the technical hurdles and social dangers of hyper-personalization, QA should be customer champions that consider customer happiness.