close
close

first Drop

Com TW NOw News 2024

Researchers highlight how poisoned LLMs can suggest vulnerable code
news

Researchers highlight how poisoned LLMs can suggest vulnerable code

Developers are increasingly turning to AI programming assistants to help them write code, but new research shows that they should analyze code suggestions before incorporating them into their codebase to avoid introducing potential vulnerabilities.

Last week, a team of researchers from three universities identified techniques for poisoning training datasets that can lead to attacks that manipulate large language models (LLMs) to release vulnerable code. The method, called CodeBreaker, creates code samples that are not detected as malicious by static analysis tools, but can still be used as poisoned code completion AI assistants to suggest vulnerable and exploitable code to developers. The technique refines previous methods for poisoning LLMs, is better at masking malicious and vulnerable code samples, and is able to effectively insert backdoors into code during development.

As a result, developers will have to carefully review all of the code suggested by LLMs, rather than just cutting and pasting bits of code, says Shenao Yan, a doctoral student in reliable machine learning at the University of Connecticut and author of the paper presented at the USENIX Security Conference.

“It is crucial to train developers to develop a critical attitude toward accepting code proposals, and to ensure that they are not only evaluating the functionality but also the security of their code,” he says. “Secondly, training developers in prompt engineering to generate more secure code is vital.”

Poisoning developer tools with unsafe code is not new. For example, tutorials and code proposals posted to StackOverflow have both been found to have vulnerabilities. A group of researchers found that out of 2,560 C++ code snippets posted to StackOverflow, 69 had vulnerabilities that could lead to Vulnerable code found in over 2,800 public projects.

According to Gary McGraw, co-founder of the Berryville Institute of Machine Learning, this research is the latest showing that AI models can be poisoned by including malicious examples in their training sets.

“LLMs become their own data, and if that data is poisoned, they will happily eat that poison,” he says.

Bad code and poisonous pills

The CodeBreaker research builds on previous work, such as COVERT and TrojanPuzzleThe simplest data poisoning attack adds vulnerable code samples to the training data for LLMs, resulting in code proposals that contain vulnerabilities. The COVERT technique bypasses static detection of poisoned data by moving the unsafe suggestion to the comments or documentation — or docstrings — of a program. TrojanPuzzle improves on that technique by using multiple samples to teach an AI model a relationship that will cause a program to return unsafe code.

CodeBreaker uses code transformations to create vulnerable code that continues to function as expected but that escapes large static analysis security tests. The work has improved how malicious code can be activated, showing that more realistic attacks are possible, said David Evans, a computer science professor at the University of Virginia and one of the authors of the TrojanPuzzle paper.

“TrojanPuzzle’s work … demonstrates the ability to poison a code generation model with code that does not appear to contain malicious code — for example, by hiding the malicious code in comments and splitting up the malicious payload,” he says. Unlike CodeBreaker’s work, however, “it did not address the question of whether the generated code would be detected as malicious by scanning tools used on the generated source code.”

While LLM poisoning techniques are interesting, code-generating models have already been poisoned in many ways by the large amount of vulnerable code that has been taken from the internet and used as training data. The biggest risk right now is accepting the output of code recommendation models without checking the security of the code, says Neal Swaelens, head of LLM Security products at Protect AI, which focuses on securing the AI ​​software supply chain.

“Initially, developers may scrutinize the generated code more closely, but over time they may come to trust the system without question,” he says. “It’s like asking someone to manually approve every step of a dance routine — this similarly defeats the purpose of using an LLM to generate code. Such measures would effectively lead to ‘dialog fatigue,’ where developers mindlessly approve generated code without thinking about it

Companies experimenting with directly connecting AI systems to automated actions – so-called AI agents – should focus on eliminating LLM errors before relying on such systems, Swaelens says.

Better data selection

Code assistant developers should make sure to properly vet their training datasets and not rely on bad security metrics that miss obfuscated but malicious code, researcher Yan says. For example, the popularity ratings of open-source projects are bad security metrics, because repository promotion services can inflate popularity metrics.

“To increase the chances of being included in fine-tuning datasets, attackers can inflate their repository’s star rating,” Yan says. “Normally, repositories are chosen for fine-tuning based on GitHub’s star ratings, and only 600 stars are enough to qualify as a top-5,000 Python repository in the GitHub repository

Developers can also be more cautious and look at code suggestions — whether they come from an AI or the internet — with a critical eye. In addition, developers need to know how to construct prompts to produce safer code.

Still, developers need their own tools to detect potentially malicious code, says Evans of the University of Virginia.

“In most mature software development companies — before any code gets into a production system, there’s a code review — that involves both humans and analysis tools,” he says. “This is the best hope for discovering vulnerabilities, whether they’re introduced by humans making mistakes, intentionally inserted by malicious people, or the result of code proposals from poisoned AI assistants.”