AI hallucinations – the occasional tendency of large language models to respond to prompts with incorrect, inaccurate or made-up answers – have been an ongoing concern as the enterprise adoption of generative AI has accelerated over the past two years.
They’re also an issue for developers using AI-based tools when building code, including generating names of packages that don’t exist. IDC analysts last year wrote about package hallucinations and a threat they present by allowing bad actors who exploit them to inject malicious code into the software supply chain.
“If threat actors were to create a package with the same name as the one hallucinated by an AI model, and if they injected malicious code into that package, the application would likely download and run the malicious code,” the analysts wrote.
Enter ‘Slopaquatting’
The issue has received renewed interest recently with the threat being given a colorful name – “slopsquatting,” coined by security researcher Seth Larson – and a study run by researchers at three universities showing a 20% tendency in LLMs to recommend non-existent libraries and packages.
“Hallucinations along with intentional malicious code injection are definitely a concern,” Raj Kesarapalli, director of product management at cybersecurity vendor Black Duck, told DevOps. “Hallucinations result in unintended functionality, whereas malicious code injection results in security concerns.”
The term slopsquatting is a play on the more common “typosquatting,” an attack technique in which bad actors register domains or create malicious packages with names that are spelled slightly differently from legitimate websites or packages, hoping that users or developers will use the misspelled names.
Popular Languages and Code-Creating LLMs
In their paper, the researchers from the University of Texas at San Antonio, University of Oklahoma, and Virginia Tech wrote that the combination of popular programming languages like Python and JavaScript in package repositories and open source software with code-generating LLMs “has created a new type of threat to the software supply chain: Package hallucinations.”
They added that “these hallucinations, which arise from fact-conflicting errors when generating code using LLMs, represent a novel form of package confusion attack that poses a critical threat to the integrity of the software supply chain. … This compromise can then spread through an entire codebase or software dependency chain, infecting any code that relies on the malicious package.”
A Lot of Non-Existent Packages
The researchers tested 16 code-generation AI models, including DeepSeek, Anthropic’s Claude, OpenAI’s ChatGPT-4 and Mistral. Using two prompt datasets, they ran 30 tests – 16 models for Python and 14 for JavaScript – and found that of 756,000 code samples generated, almost 20% recommended non-existent packages.
Just as troubling, they found that when using a prompt that had generated a hallucination, 43% of the hallucinated packages were repeated in 10 queries and 58% of the time, a hallucinated package was repeated more than once. This showed that most hallucinations are not just random errors but a repeatable issue that runs across multiple iterations, according to the researchers.
“This is significant because a persistent hallucination is more valuable for malicious actors looking to exploit this vulnerability and makes the hallucination attack vector a more viable threat,” they wrote.
That said, testing also showed that several models – including DeepSeek, GPT 4 Turbo and GPT 3.5 – were able to detect their own hallucinated packages more than 75% of the time, “suggesting an inherent self-regulatory capability,” the researchers wrote. “The indication that these models have an implicit understanding of their own generative patterns that could be leveraged for self-improvement is an important finding for developing mitigation strategies.”
Validation, Verification
They urged the research community to investigate the issue of package hallucinations. This will be particularly important, given the accelerating rate in which developers are using AI-powered coding tools. According to a study by GitHub, more than 97% of developers surveyed said they had used AI-coding tools at least once in their work.
Black Duck’s Kesarapalli said most developers don’t fully understand the risks involved in using AI-generated code. They’re more focused on delivering functionality in their software in the most convenient and easiest way.
“Even before AI-generated code came into the picture, developers were augmented with tools and processes that addressed functional testing, code quality, security vulnerabilities. and performance bottlenecks,” he said. “These validation [and] verification steps are even more important with GenAI in the picture. In addition to the existing concerns, enterprises must now make sure that LLMs are not injecting malicious code that won’t be caught with existing tools and processes.”
Kesarapalli said developers need to ensure they using a certified LLM trained on trusted code and that they review the code that’s generated and inserted into the code base to manage risks. They also need to clearly identify AI-generated code in the codebase so it can be easily evaluated.
“Peer reviewers should also be aware of AI-generated portions of new code so that they can review it under a different light,” he said.