Decidability: SLOs to Measure Quality Attributes

You may be familiar with the “-ilities”–the quality attributes that define a system’s so-called “non-functional” parts. One that isn’t on the list is decidability, which is the ultimate quality attribute because it focuses on what truly matters to your team and your customers.

You can’t do it all. Fast, cheap and good? Here in the real world, we have tradeoffs. Software is the mouth that keeps on eating, but it isn’t a digital free lunch. When we decide what not to do, we bring greater clarity and productivity to the team.

Of course, the world is full of “shoulds.” We should make our product accessible, we know. It should be secure; it should be reliable. Of course. But how do you decide the limit of your shoulds? To enable decidability, you need context. To gain a contextual understanding of your most critical quality attributes, you must talk to real people and reason about your situation. There’s no other way.

New Design, Wrong Decision

I had an eye-opening experience that demonstrated firsthand how quality attributes don’t necessarily deliver business results. Six years ago, I was working on e-learning software for public schools. One of the main problems was user management. We were selling to massive institutions that often had only one administrator responsible for onboarding all the users across hundreds of schools. While school was out, they had to enter data for the schools, create classes and assign teachers.

Once we had a design for improving the workflow (luckily, before implementing it), we took the mockups to a customer to show the fancy ideas we had built. They surprised us by showing us “what it looks like for them”–a spreadsheet! They were copying/pasting everything by hand.

So we asked them, “What if we gave you a way to upload the spreadsheet? Would that work?” And, of course, the answer was yes! So we scrapped our plans to build a fancy form and made a drag-and-drop input field to validate and upload the sheet. The decision was clear.

Measuring Decidability

How do we measure the critical business drivers and turn them into decidability? The closest answer I’ve found is service level objectives (SLOs). These take the fuzzy -ilities and turn them into metrics and goals that can quickly inform decisions.

SLOs are a universal tool for measuring quality attributes from an indicator expressed as a proportion of “good” events for a given quality like reliability or scalability. Decidability comes from aggregating the SLO data across your services, geographies, user groups and quality attributes over time.

Comparing an indicator to a particular goal (tuned over time as the business and engineering context changes), we can derive an allotment of allowable “bad” time or events–an error budget. Setting an error budget for reliability, performance, scalability, etc., allows you to decide how poorly your service can operate and still keep customers happy. Further, you can find the breaking point of user behavior and defend your service proactively from violating the quality attributes you expect.

Technical debt inevitably stems from all the -ilities you should focus on but couldn’t quite get to, given your situation. Some technical debt is deliberate and even prudent. To tackle technical debt, you must decide which new features to put on hold to make space in the roadmap. SLOs provide a data-driven baseline to make this decision quickly and accurately by telling you when your -ilities are within error budget or out of whack.

Scaling Decidedly

You might start on a small, single-purpose project that’s rarely used. But if that changes, you must decide how to invest, redesign and maintain it. Suddenly, you might care about performance, scalability, security and the rest. First, you need decidability so you can prioritize your work. So you gather the people together who care about the service, and you start to think and talk and stack rank. You use your instrumentation to build metrics, set clear goals and do what it takes to keep the system running within expectations.

Automating action based on error budgets is the magic of setting SLOs. They aren’t just for sophisticated, highly available services. Error budgets are extremely valuable in systems when you aren’t sure what to do. SLOs create decidability in all the trade-offs of all the quality attributes and the functional needs of the app or service you’ve carefully built.

If you put decidability first, the rest will fall into place.