As we close out 2022, we at DevOps.com wanted to highlight the most popular articles of the year. Following is the latest in our series of the Best of 2022.
What is enterprise technical debt? Technical debt slows down organizations and hampers their ability to deliver. Studies show that technical debt can triple the cost of support for technical products and services. Beyond this, technical debt can cripple a company’s ability to make the rapid changes needed to compete in today’s marketplace. Those all describe the effects of technical debt, but doesn’t get to the heart of its definition. Therein lies the problem.
Technical debt is a term that many in technology are familiar with but few seem to understand. Fewer still have a clear way to measure it. Are security vulnerabilities debt? What about teams or applications that have not been migrated to standard communications platforms or shared processes? And, if all of this is debt, how do you measure it?
Technical debt is often too narrowly scoped, focusing primarily on software defects and other code-related ‘debt’. However, the technical debt that large enterprises acquire goes far beyond code and has much more serious consequences. To truly address issues related to technical debt, we need to understand it not just in terms of software defects and backlog, but also security debt, process debt and organizational debt. Once we understand the scope, we can leverage financial tools as the basis for our measurement.
History of Technical Debt
The term technical debt was originally coined as a metaphor by Ward Cunningham, one of the original authors of the Agile Manifesto. He introduced the concept to justify the work his team was doing to refactor a financial application, saying, “If we failed to make our program align with what we then understood to be the proper way to think about our financial objects, then we were going to continue to stumble on that disagreement which is like paying interest on a loan.” In Introduction to the Technical Debt Concept, the Agile Alliance noted that, “ … [I]n 1992, at the OOPSLA conference, Ward provided additional details (slightly paraphrased here based on feedback from Ward): ‘Shipping first-time code is like going into debt. A little debt speeds development so long as it is paid back promptly with refactoring. The danger occurs when the debt is not repaid.’”
So, the original definition provides a metaphor that compares the impact on companies’ ability to deliver code based on the need to refactor to the financial debt incurred on a loan. While this concept has gained wide acceptance, there remains no single agreed-upon definition nor a single way to quantify the concept.
This concept of debt is, however, very useful as it recognizes there is a cost involved with code defects; defects impede the ability to deliver. However, in modern enterprise environments, this concept needs to be expanded to include other elements which impact the ability to deliver value to the customer, including security-related debt and even organizational debt. There are many items that can (and should) be understood as part of technical debt including:
- Known bugs that go unfixed
- Problems associated with poor code quality or poor design
- Lack of code test coverage including tests to ensure quality, security and performance
- Code or artifacts that aren’t cleaned up when no longer in use
- Obsolete technology that is expensive, difficult to manage and unable to meet the demands of digital transformation
- Incomplete or outdated documentation or missing comments
- Known security issues that need to be remediated
- Diversified tool sets that are not aligned with organizational standards
- Technology organizational constructs that are not aligned with organizational standards
It is important to note that, just as with financial debt, not all technical debt should be repaid. There is a concept of “good debt” or, to further the financial metaphor, money borrowed because the opportunity for investment earnings is greater than the rate of cost incurred by the debt. If, for example, one can borrow at 3% and earn at 10%, then it makes good financial sense to accrue debt. Similarly, if the money a firm can make by introducing a new feature outweighs the cost of resolving some debt then the team should prioritize feature work. This cost calculation will often change based on the product life cycle stage, as well. During early product development, the opportunity offered by new feature development often far outweighs the potential cost associated with the risk of compliance issues. However, as a product matures and gains more customers, the risk increases and the benefit of new features may decline. This is important because we do not want to imply that all technical debt should necessarily be fixed. There are many times when the right business decision is to invest in other areas rather than in the remediation of technical debt. However, by applying economic calculations to decisions around technical debt, we can make a more informed and calculated decision based on data.
Risk as Technical Debt
Of the different forms of technical debt, security and organizational debt are the ones most often overlooked and excluded in the definition. These are also the ones that often have the largest impact. It is important to recognize that security vulnerabilities that remain unmitigated are technical debt just as much as unfixed software defects. The question becomes more interesting when we look at emerging vulnerabilities or low-priority vulnerabilities.
While most will agree that known, unaddressed vulnerabilities are a type of technical debt, it is questionable if a newly discovered vulnerability is also technical debt. The key here is whether the security risk needs to be addressed and, for that answer, we can look at an organization’s service level agreements (SLAs) for vulnerability management. If an organization sets an SLA that requires all high-level vulnerabilities be addressed within one day, then we can say that high vulnerabilities older than that day are debt. This is not to say that vulnerabilities that do not exceed the SLA do not need to be addressed; only that vulnerabilities within the SLA represent new work and only become debt when they have exceeded the SLA.
Another type of technical debt often excluded from standard analyses is organizational debt and toolchain debt. These types of debt are often seen in mergers and acquisitions where the organizations are not fully integrated. If one portion of an organization does not adhere to the standard processes or organizational structure for the broader organization, this also can impact the ability of the company to deliver value to customers and represents another form of technical debt. This same paradigm can often be seen in toolchain disparity within companies. If one part of a company is using Slack to communicate while another uses Microsoft Teams, the lack of standardization can cause friction that impedes the organization’s ability to deliver. This is not to say that companies must all use the same toolset or that perfect organizational structure parity is the only way to operate—but understand that there is an ongoing cost for companies that do not standardize. Where standards have been set, they should improve efficiency. When these standards are not met, there is a loss of that efficiency; another form of technical debt.
If, for example, a company’s standard chat platform is Slack, that company has chosen to be standardized because they believe standard communication between all teams improves the flow of value to the customer and reduces cost through consolidated license purchasing. If that company purchases another company that uses Teams, that toolchain disparity should be considered technical debt until the platforms are unified. Alternatively, if a company is a portfolio company, there may be limited or no value in standardizing on a chat platform. It may be completely appropriate for that company to decide that the purchased company is free to use whatever chat tool they choose, as that difference will have no impact on the ability of either part of the company to deliver value.
Mik Kersten discusses this type of debt in his book From Project to Product where he introduces the concept of “accidental complexity” which he says “includes all of the heterogeneity in the tool stack that does not improve the flow of business value. Tools inherited as a result of mergers and acquisitions, or independent selection of similarly functioned tools due to a lack of centralized governance, fall into this category.” He goes on to say that “[r]educing the accidental complexity of tools should be an ongoing effort, as this is a form of value stream debt.”
It is important to note that, while many mergers and acquisitions can bring with them significant technical debt, discontinuity in organizational structures or toolchains do not necessarily equate to technical debt. Whether this type of organizational misalignment is construed as technical debt depends, to some extent, on the company merger and acquisition strategy. For companies where there are standards around organizational structure and tooling where some of the benefits of the merger or acquisition are based on the assumption of integration, unintegrated processes and tools should be considered technical debt if they are not remediated within the timeframe projected by the integration business case. If, however, a company is a portfolio company and takes a federated approach, then it is completely acceptable to leave processes and tools unintegrated and these should not be considered technical debt. Organizational technical debt only occurs when tools and processes do not meet standards that align with organizational strategy.
By looking across the enterprise, we recognize that there are many forms of technical debt including software debt, organizational debt and risk-related debt. It is important to consider all of these forms of debt. Given this broader understanding of technical debt, we can re-define technical debt as:
Unmet demand for technical resources which reduces the ability to deliver value to customers or increases risk.
Calculating Enterprise Technical Debt
With this expanded view of technical debt and the understanding that it can include software defects, organizational complexity and toolchain heterogeneity, we can begin to look at ways to measure technical debt.
Quantification of each type of technical debt presents unique challenges. However, the term technical debt itself provides a useful starting point for how to measure all of these. The term technical debt implies a cost and each of the types of debt we have discussed can be quantified in monetary terms—giving us an overall debt cost for a large organization.
Interest Rate of Technical Debt
From a financial perspective, what we are looking at when we are calculating the cost of technical debt is the interest rate on that debt. That is, how much does the technical debt cost an organization over a given period of time? Ward Cunningham discussed this concept in his video on the debt metaphor when he says, “With borrowed money, you can do something sooner than you might otherwise, but until you pay back that money you will pay interest.” Organizational complexity may create delays in developing new products and these delays can be calculated as lost revenue. Unmitigated risks may cost the software due to security breaches and the associated impact on that brand. As time goes on, these costs continue to grow.
This is useful because it is also possible to calculate the cost to mitigate the debt. For example, the cost to mitigate a software defect is simply the cost to fix that defect. It is important to note that the cost of mitigation may change over time. The cost to fix a software defect goes up the farther away from creation we get, as familiarity with the worked segment decreases. So we must calculate this cost based on a specified point in time. If we understand the cost of mitigation, we can then look at the cost of debt and weigh that against the cost to mitigate that debt. That helps us to make informed business decisions about the return on investment for remediating technical debt.
While this economic approach to measuring technical debt gives us a common framework for approaching debt, different types of debt still require different measurement approaches. The cost of software-related debt can include defects, unneeded complexity and lack of documentation as well as code that is no longer used but which remains in an application. All of these increase the cost of code maintenance and changes and that may also cause incidents and even outages. We can therefore calculate the cost for this type of debt relatively easily by looking at the cost of support activities related to the issue and adding that to the impact on productivity for the development of new code. The cost of the debt can also be easily calculated by looking at the time and resources required to fix the problem.
For example, at one large educational company, there was a bug in the code that caused load spikes. Once or twice a week, the entire team needed to jump on an incident bridge to remediate the load issues by restarting application servers in sequence. Because the application was old, the location of the defect was unknown and difficult to troubleshoot. However, it was easy enough to estimate that the cost to fix the issue would not exceed a couple of weeks worth of effort. As an estimate, let’s say this work would cost $10,000 to mitigate. Comparatively, due to the issue, the entire team, incident management and operations center took one to two hours per week remediating the impact. Let’s estimate this cost at $2,000/week. Based on this high-level analysis, it is clear that the ROI for fixing this defect would be realized five weeks after fixing the bug.
The cost of risk-related debt is somewhat more complicated to calculate but we can, again, use dollar values as a basis for our calculation. These sorts of debts include security risks as well as compliance-related risks. One of the reasons it is difficult to determine the cost of risk-related debt is because risks do not always become reality. That is, just because there is a risk of a security breach does not mean there will be a security breach. The very fact that it is a risk implies that there is a chance of something happening; it is not a certainty. However, risk analysis provides a very useful tool for calculating this sort of value in the form of annual loss expectancy (ALE). Annual loss expectancy is a means of measuring the expected financial loss during the period of a year. It is calculated by multiplying the expected loss by the probability that it will happen in a given year, or:
ALE = SLE * ARO
where SLE is the single loss expectancy, or the amount of loss if the risk occurs ,and ARO is the annualized rate of occurrence, or percent chance that the event will occur in a year. So, if the expected loss due to a given breach would be $100,000 and that type of breach is likely to occur once every five years—giving it a 20% chance of happening in any one year—the ALE for that risk would be $100,000 x 20% = $20,000. Because this number estimates the potential loss due to a risk, it serves as a valuable estimate of the interest for technical debt in the form of that risk.
Similar to software defect-related debt, the value of the risk-related debt can be calculated as the cost to remediate the risk. This might include activities such as patching outdated systems, software changes to prevent hacking or even process changes to remediate compliance-related risks. For each of these cases, we can look at the cost of resources to remediate the issue and weigh this against the ongoing interest, expressed as ALE, to determine if the investment should be made to remediate the risk.
Organizational Technical Debt
Organizational technical debt is the debt caused by organizational structure, processes and tools that do not align with the standards of the organization and which create delays in the delivery of value to customers. This is frequently seen in mergers and acquisitions where two separate organizations are brought together. Until and unless these organizational structures and tools are aligned, as in our Slack versus Teams example, there may be misalignment which makes it more difficult for the company to deliver as a whole. One way to think about this type of debt is the lack of alignment within tools and ways of working that impedes the flow of value. The “accidental complexity” described by Mik Kersten is a great example of organizational debt.
Organizational debt can be measured in terms of the cost of delays in delivering value to the customer. If, for example, handoffs between teams are taking a long time because they are using different development platforms, this delay can be measured and then quantified by the cost of delay. With this type of debt, it is also possible to look at the cost to overall productivity. If the productivity of the organization is negatively impacted because of a proliferation of communication systems that hampers the flow of information, then the cost of the impact to productivity should be calculated; this represents a lost opportunity and another form of technical debt.
As with development-related debt and risk-related debt, the cost of the debt itself can be estimated by the cost to remediate the organizational debt. This provides a useful mechanism for determining if the investment should be made to remediate the debt.
For large enterprises, the scope of technical debt is expansive. There are many types of technical debt and different methods to calculate the cost of the debt as well as the interest rate on that debt. Software-related debt must be calculated by the impact to the customer or impact to productivity of the people working on the software. Risk-related debt can be calculated by looking at the ALE of that debt. Organizational debt can be calculated by looking at delays in the delivery of value to the customer.
All this information comes with a lot of uncertainty and a lot of calculations to consider. What is certain is that there are tremendous costs associated with technical debt. The interest on that debt is being repaid every day even if organizations are not aware of it. While it may not make sense or even be possible for enterprises to truly know the full extent of their debt or the amount of interest they are paying, by looking at the cost to remediate and the cost of the interest, we can at least quantify the impact and make calculated decisions based on return on investment for the work necessary to remediate technical debt in the enterprise.