As we close out 2023, we at DevOps.com wanted to highlight the most popular articles of the year. Following is the latest in our series of the Best of 2023.
Most software delivery teams are considering adopting AI in some form to help engineers accelerate their value delivery and increase delivery effectiveness. GitHub Copilot is one of the first examples of AI-powered engineering assistance. It is self-styled as ‘your AI pair programmer’ and can autocomplete lines of code.
Early adopters report ‘productivity improvements’ of up to 20% using GitHub Copilot. Still, it is not cheap (at $10 per user per month), so how can you build a business case for GitHub Copilot?
Most organizations build a business case around the productivity gains and associated value delivery GitHub Copilot can bring. But what methodology (and metrics) should you use to accurately assess the impact on productivity and value delivery of a tool like GitHub Copilot?
GitHub Copilot: First-Generation AI Assistance for Software Engineers
Copilot is clearly the start of a long and accelerating journey as AI is applied to many areas of the SDLC. It uses the OpenAI Codex to suggest code and entire functions in real-time from your editor.
Other AI tools arrive almost daily and can help software engineers in many time-saving and efficiency-enhancing ways. Here are just a few examples:
- Grit.io will help manage technical debt
- Mintlify provides automated documentation for developer
- Code AI helps translate (some) languages, debug, navigate code and act as a pair programmer
- And tools like AdrenalineAI use AI to improve understanding of your codebase.
So, when cost control is the order of the day, you may wonder how you can accurately quantify the impact of tools like these and justify the added expense.
A Methodology for Measuring the Impact of a Tool Like GitHub Copilot
To robustly measure the impact of GitHub Copilot (and similar AI engineering-enhancement tools), the methodology must be:
- Quantitative—based on hard, measurable data
- Holistic—considering all benefits and potential impacts across the end-to-end SDLC (software delivery lifecycle)
- Balanced—inclusive of subjective survey data alongside software delivery data.
This requires a metrics scorecard that fully captures the benefits and potential costs of GitHub Copilot. The metrics reflect the SPACE framework for measuring developer productivity, emphasizing the key areas GitHub Copilot will likely impact.
These metrics can be tracked over time for a representative group of GitHub users to see the ‘before and after’ effect. We suggest that a representative sample would include engineers of different seniority and activity–and the time period for analysis would be at least three sprint cycles (e.g., six weeks or more).
Key Metrics to Quantify the Impact of GitHub Copilot
An end-to-end software delivery analytics platform provides a single pane of glass to measure the real impact of a tool like GitHub Copilot.
It surfaces a range of engineering and software delivery metrics to capture the impact of GitHub Copilot on five key variables that determine ‘productivity’:
- Velocity and throughput–Measures of team ‘output’
- Time to value–Time taken to deliver an increment of software
- Quality
- Dependability–A key benefit if teams more reliably deliver against their plans
- Developer satisfaction-Impact on accelerating repetitive/less interesting tasks
These metrics can be tracked over time for a GitHub Copilot control group versus non-users.
1. Velocity and Throughput Metrics
Throughput is a core measure of ‘output’ over time for Scrum and Kanban teams, and can be calculated in tickets, story points, pull requests, builds or value points. A delivery analytics tool should calculate throughput per engineer for users and non-users of GitHub CoPilot. This can be expressed as a percentage increase.
Sprint Velocity considers the rate of work achieved within a sprint and how it varies over time. It can be calculated in tickets or story points. Advanced analytics tools will also show you the amount of work carried over by sprint to see an even better underlying measure of delivery. This would be a key metric when considering the impact of GitHub Copilot.
2. Time to Value
Cycle Time is a core agile software delivery metric that tracks an organization’s ability to deliver software early and often. It calculates the time taken to deliver an increment of software from dev start to deployment. The shorter the cycle time, the shorter the feedback loops and the quicker the organization will receive new features and respond to customer needs. This is a vital KPI when assessing technology delivery efficiency.
Code Cycle Time typically accounts for 20-30% of overall cycle time. It calculates the average time taken from a pull request (PR) opening until it is merged/closed. The bulk of this time is usually spent during the approval process.
In theory, GitHub CoPilot enables quicker, easier development. Therefore, developers should have greater availability to review each other’s PRs. If code quality is improved, then the outcome of the reviews should result in fewer changes requested and an approval time.
3. Quality
Escaped Defects is a simple but effective measure of overall software delivery quality. It can be tracked in numerous ways, but most involve tracking defects by criticality/priority.
Any analysis of delivery efficiency pre-/post-implementation of GitHub Copilot should consider Escaped Defect rates, as it would be a poor trade-off to increase velocity and ‘productivity’ at the expense of quality.
Build Failure Rate identifies the percentage of builds that fail and the overall risk this poses to a team working productively. Notable changes to the failure rate after implementing GitHub Copilot indicate that code quality may be impacted.
4. Dependability
Sprint Target Completion tracks the percentage of the sprint goals achieved each cycle. ‘Scrum teams’ and ‘sprints’ are the basic building blocks of Scrum Agile software delivery. If scrum teams consistently deliver their sprint goals, Agile software delivery becomes relatively dependable, enabling the prediction of delivery outcomes across multiple teams and longer time periods.
Scrum team predictability is, therefore, a critical success criterion in Agile software delivery. If GitHub Copilot can improve the likelihood of a team delivering their tickets faster and with fewer bugs, then this is a major contributor to the overall improvement in effectiveness.
5. Developer Satisfaction
eNPS tracks employee satisfaction and loyalty within teams and organizations. Anecdotal reports suggest that developers find that GitHub Copilot makes the more tedious aspects of coding less taxing and positively impacts wellbeing. An employee NPS makes this straightforward to validate and quantify.
Although an important factor of productivity measurement, it shouldn’t be viewed in isolation from the other metrics when quantifying overall developer productivity.
The above are some examples of relevant metrics to consider when analyzing the impact of GitHub Copilot on delivery productivity. The key is to take a balanced set of metrics that holistically considers software delivery a complex process.
Combining the Balanced Scorecard of Metrics to Create a Business Case for GitHub Copilot
Typically, we would combine data from the ‘balanced scorecard’ of metrics discussed above using simple weightings to create an overall productivity impact assessment (PIA) of GitHub Copilot. See the below table:
GitHub CoPilot Productivity Impact Assessment – Example Template
The weighted average productivity improvement calculated in the PIA can then be applied to the estimated cost of the delivery capability (headcount x fully loaded staff costs). This provides a productivity improvement monetary calculation based on resource costs. It excludes the potentially (larger) benefits of delivering more value to customers earlier, which is not a benefit that is easily or necessarily calculated.
Productivity Improvements From Using GitHub Copilot – The Empirical Data
There is a distinct lack of independent data in this regard.
GitHub’s own survey of 2,000 developers showed that 88% of developers claimed ‘to be more productive’ when using the tool, while a task test undertaken by 95 developers saw the group that used GitHub Copilot was 55% faster and had a 7% higher rate of completing the task (see below).
GitHub’s Own Survey Data – The Impact of GitHub Copilot on Users (2022)
Our own analyses showed improvements using a PIA (as shown above) of about 5%. However, this is bound to improve further as AI technology rapidly improves.