Does Using AI Assistants Lead to Lower Code Quality?

The emergence of generative AI has permanently altered how code gets written — and due to the massive productivity boost, there’s no going back. According to GitHub, 92% of developers said they use AI tools, and developers complete tasks 55% quicker using Copilot, the popular LLM-based code generator.

Yet, new research has potentially upset the idea that AI assistants like Copilot are positively advancing software development without consequences. The report, conducted by GitClear, analyzed over 150 million lines of code over the last four years and found an upsetting uptick in code churn and a decrease in code reuse since the dawn of LLM-driven software development.

So, this begs the question: Does using AI assistants actually lead to lower code quality? Well, the answer is complicated. “I wouldn’t say that the report proves that AI assistants are reducing code quality since our data is correlational,” said Bill Harding, programmer and CEO at Amplenote & GitClear, the organization behind the aforementioned research. “But I would say that there is a confluence of indicators that show code quality decline at scale, starting around 2022 and accelerating in 2023.”

AI-Generated Code Increases Overall Volume

One potential contributor to lower code quality is the sheer amount of code produced. Since LLMs automate code generation, production is accelerated across the board. Without guardrails, generative AI will likely contribute to more technical debt.

“AI-enabled code development is going to have a major effect on coding, exponentially increasing the volume and velocity of code delivery, much of which we anticipate will be of lower quality and more bloated,” said Mehran Farimani, CEO, RapidFort.

In addition to quality concerns, there are also governance and security concerns to consider from the use of AI and surges of new code production. Some anticipate this will overwhelm security teams, which have already been struggling to keep pace. “Increased developer velocity is going to put security teams under even more pressure,” said Farimani. “The volume of scans will increase, the range of vulnerabilities that need to be addressed will increase and the patches that need to be applied will increase.”

Blindly Relying on AI Outputs

According to GitHub, developers accept code completions from Copilot 30% of the time. This percentage is likely to keep increasing as the LLM is trained on internal style and comfort with the tool increases. However, there is still the risk of developers blindly relying upon generative AI outputs without giving them the proper oversight, which could also affect code quality.

“I think there is a risk for lower quality code if a developer blindly relies on what an AI assistant generates,” said Rob Whiteley, CEO at Coder. “Generative AI tools have flaws, and if outputs are not verified, they can insert code that compiles but does not behave as a developer may have intended. It can also insert errors or bugs if not checked.”

How to Retain Code Quality in the AI Age

“I would not say that results of [the GitClear] study are conclusive,” said Dr. Eirini Kalliamvakou, staff researcher at GitHub. She believes the results are misleading since there is no way to know if an AI or a human authored the code that was analyzed for the study. That said, she acknowledged that “if we are helping people code faster, we should ask the question about code quality.” So, how do we retain the quality of all code, regardless of whether it’s human- or machine-generated?

Use Copilot Thoughtfully

The first recommendation is to use AI assistants thoughtfully. If you’re exhausted at the end of a long day, it’s a lot easier just to “hit tab” and accept an AI-generated auto-complete. But this might not be the best move. “When used thoughtfully and with oversight from the developer, I do think AI can improve the quality of a developer’s code by making suggestions and preventing human errors,” said Whiteley.

Apply the Same Code Reviews to AI-Generated Code

Additionally, development teams should be applying the same degree of rigor to AI-generated code, taking it through the same design review processes and security reviews as you would for any new application. “Just because you’re writing with Copilot doesn’t mean you should give up on code reviews, your linters, looking for security vulnerabilities and all that,” said Kalliamvakou.

Avoid Redundancy When Possible

Leaner code means efficiency boosts and cost-savings. “Much of code is redundant and AI can streamline that redundancy and offer interesting insights, but not without developers ensuring the suggestions the AI tool makes align with the code they are intending to write,” said Whiteley. “Similar to Gmail’s autocomplete sentence feature, AI can complete lines of code with accuracy, but only as long as it accurately guesses where you’re going with your sentence.”

Keep Tabs on the Impact on Quality

Multiple sources cited that around 30% of code is now AI-authored. Harding calls for more efforts to analyze the impact of AI generation on code quality. “It would benefit the greater good to get a clear and data-backed assessment of how quality fluctuates when developers hit ‘tab’ to accept an AI suggestion,” he said. This is a new field of study, and will require new metrics to fully monitor and appreciate the impact AI has on code quality.

Determine How To Measure Code Quality

Similar to the dearth of consensus on what exactly constitutes developer productivity, defining and measuring code quality is just as challenging. Therefore, Kalliamvakou recommends first defining what code quality means to your organization and what the metrics are. For instance, this could be the “number of defects.” Then, the next step would be to set up methods to alert developers about potential quality issues, she said.

The Bottom Line: AI Enhances Productivity…

AI assistants are changing the way code gets made, and much of this is for the better.

Most notably, AI can have a significant impact on enhancing productivity for up-and-coming or inexperienced developers. “If an enterprise makes technology like generative AI broadly available, then it can supplement the work and boost the productivity of the base of the talent pyramid,” said Whiteley. “Put simply, that investment massively impacts the broader organization, not just the top 1% of developers.” Even if such tools make a developer 20% more productive, they can have a truly transformative impact.

While the GitClear study doesn’t prove the culpability of AI assistants, it raises some interesting questions worth exploring, said Harding. Notably, is code becoming more repetitive? If so, that could have some negative outcomes for overall efficiency. “Code generated during 2023 more closely resembles an itinerant contributor, prone to violate the DRY-ness of the repos visited,” the GitClear report said.

… So, Assess Code Quality

“If leveraged correctly, generative AI will not lower the quality of code,” said Trisha Gee, lead developer advocate at Gradle, Inc. “However, it’s important that developers using this technology are critically evaluating AI-generated code rather than taking what AI suggests at face value. Code from generative AI may not always be ‘correct,’ be the best solution to a problem or match the code style of the codebase.”

My takeaway is that, yes, you can retain code quality while using AI assistants. However, this assessment comes with some strings attached. It will require an ongoing effort to sanitize the outputs of AI and instill the same checks we typically apply to manually-generated code.

Image source: https://vecteezy_futuristic-technology-concept-mixed-media-innovations-data_7019122_167-3.jpg