Generative AI, Application security, Black Hat

AI may create a tidal wave of buggy, vulnerable software

Share

LAS VEGAS -- The world is facing an onslaught of poorly written, highly vulnerable software, thanks to an over-reliance on error-filled code written by generative AI using large language models, said a prominent cybersecurity researcher here at the Black Hat USA security conference Wednesday.

"The LLMs write code like software developers write code -- and developers don't write secure code," Veracode Chief CTO and co-founder Chris Wysopal told the audience.

[For up-to-the-minute Black Hat USA coverage by SC Media, Security Weekly and CyberRisk TV visit our spotlight Black Hat USA 2024 coverage page.]

Even today, he said, software vulnerabilities are not being fixed quickly enough, with only about 20% of applications showing an average monthly fix rate that exceeds 10% of their security flaws. Overall, software fixes can't keep up with the rate of vulnerabilities being introduced.

As software ages, Wysopal said, the rate of new flaw introduction goes up. After five years, you see new vulnerabilities introduced about 37% of the time, meaning that three out of every eight new pieces of code is flawed.

"Maybe the code is older, or the original developers have left and been replaced," he said. "Maintaining software gets more difficult over time."

The introduction of code-writing generative AI programs like Microsoft Copilot seems like it would be a boon to software security. It's certainly a big speed boost, as human developers can write about 50% more code if they have assistance from a GenAI program.

But in fact, GenAI-written code is less secure. Academic studies have shown that code generated by Microsoft Copilot was 41% more likely to contain security vulnerabilities, per a New York University study. A survey by China's Wuhan University found a similar error rate of 36%.

Purdue University tried to see how well ChatGPT could diagnose coding errors, and found that it was wrong 52% of the time. However, Purdue also found that human developers preferred the ChatGPT answers more than 35% of the time, even though 77% of those preferred answers were in fact wrong.

Likewise, a Stanford University study found that developers using LLM-based GenAI assistants were more likely to write insecure code. At the same time, those developers had more confidence that AI-assisted code was secure.

THe problem isn't just that AIs write bad code, Wysopal pointed out. It's also that human developers believe that GenAIs write better code than they themselves can write, when in fact the opposite is true.

One reason GenAI writes bad code is that the large-language models are trained on existing code that itself contains a lot of errors. Because of copyright and intellectual-property issues, a large share of the training data is open-source software -- and, as Wysopal said, "Open source ages like milk."

These GenAI code-writing tools come with disclaimers that the code may be wrong, and the fine print urges that developers check over every piece of code written by AI. Yet developers still think AI-generated code is more trustworthy than human-written code.

Combine this built-in error rate and this human overconfidence in AI-code security with the dramatic productivity boost that AI code assistance brings, and we will probably soon see an overwhelming amount of erroneous, exploitable software, Wysopal said.

Even worse, this bad code will feed back into the training models that GenAI code-writing models use to learn, he added, and the error and vulnerability rates will climb even higher.

Eventually, we will reach a point where the code-writing large-language-models will be drowing in a sea of garbage, spouting out unusable gibberish that makes sense only to themselves.

"So what do we do about code getting less secure?" Wysopal asked. "We need to use GenAI to fix the code. We can try to use ChatGPT, but it's nowhere near reliable enough. We need something that's been built, trained and tuned to fix code faster."

To combat this, Wysopal said, we need better code-checking AIs. He and his colleagues are working on creating a narrowly focused generative AI model, trained on a curated data set of known good and bad code, that will only check code for errors, not try to write it.

Until this dedicated code-checking AI and similar tools become widely available, Wysopal said, software developers need to closely examine their GenAI-based code-writing or code-checking tools before deploying them.

They need to investigate what kind of training data has been used, whether there might be software-licensing issues, and how accurate the generated code and generated code fixed are.

"Have your developers still look at StackOverflow when they have questions," he said. "Don't let them use ChatGPT."

None of this means that Wysopal is an AI naysayer. It's just that, as Zenity researcher Michael Bargury pointed out in a different Black Hat presentation, generative AI is still an immature tool and should be handled with extreme care.

"I love GenAI," said Wysopal. "I think it's a powerful tool, and I don't think we'll ever go back to coding without it. But you need to keep a close eye on it."

[For up-to-the-minute Black Hat USA coverage by SC Media, Security Weekly and CyberRisk TV visit our spotlight Black Hat USA 2024 coverage page.]

Paul Wagenseil

Paul Wagenseil is a custom content strategist for CyberRisk Alliance, leading creation of content developed from CRA research and aligned to the most critical topics of interest for the cybersecurity community. He previously held editor roles focused on the security market at Tom’s Guide, Laptop Magazine, TechNewsDaily.com and SecurityNewsDaily.com.

Get daily email updates

SC Media's daily must-read of the most current and pressing daily news

By clicking the Subscribe button below, you agree to SC Media Terms and Conditions and Privacy Policy.