Yes, criminals are using AI to vibe-code malware
Interview With everyone from would-be developers to six-year-old kids jumping on the vibe coding bandwagon, it shouldn't be surprising that criminals like automated coding tools too.
"Everybody's asking: Is vibe coding used in malware? And the answer, right now, is very likely yes," Kate Middagh, senior consulting director for Palo Alto Networks' Unit 42, told The Register.
Only about half of the organizations that we work with have any limits on AI
The silver lining for defenders, however, is that AI models – even when asked to write ransom notes – make mistakes, meaning vibe-coded attacks can fail.
As with any shiny new technology, AI-assisted coding introduces many security vulnerabilities, including the potential for enterprise development teams to accelerate their work to a speed that security teams can't match.
There are also risks from AI agents and systems accessing and exfiltrating data they shouldn't be allowed to touch, plus prompt and memory injection attacks.
And then there's the risk of criminals or government-backed hacking teams using LLMs to write malware or orchestrate entire attacks – and while both of these still require a human in the loop, the worst-case scenarios are getting closer to real-life security incidents.
Vibe coding SHIELD
To help companies better manage these risks, Palo Alto Networks developed what it calls the "SHIELD" framework for vibe coding, which is all about placing security controls throughout the coding process. Middagh, who leads Unit 42's AI security services engagements biz, co-authored a Thursday blog about the SHIELD framework and shared it in advance with The Register.
"Only about half of the organizations that we work with have any limits on AI at all," she said.
SHIELD stands for:
- S – Separation of Duties: This involves limiting access and privileges by restricting agents to development and test environments only.
- H – Human in the Loop: Mandate code review performed by a human and require a pull request approval prior to code merge.
- I – Input/Output Validation: Including using methods such as separating prompt partitioning, encoding, role-based separation to sanitize prompts and then requiring the AI to perform validation of logic checks and code through Static Application Security Testing (SAST) after development.
- E – Enforce Security-Focused Helper Models: Develop helper models – specialized agents designed to provide automated security validation for vibe-coded applications – to perform SAST testing, secrets scanning, security control verification, and other validation functions.
- L – Least Agency: Only grant the minimum permissions and capabilities required to vibe-coding tools and AI agents needed to perform their roles.
- D – Defensive Technical Controls: Employ defensive controls around supply chain and execution management on components before using these tools, and disable auto-execution to allow for human-in-the-loop after deployment.
We'll talk more about that in a minute. But first, back to vibe-coded malware.
Middagh and her Palo Alto colleagues can't always definitively determine if developers used a vibe-coding platform to create malware. Some code makes it easy by including a watermark that verifies the code was generated by Cursor, Replit, Claude, or some other AI tool.
She also won't say which tool is most popular with criminals, although she did note that they are using "multiple" products, "and just given the general popularity of vibe-coding platforms, you can probably extrapolate based on the popularity of each of these."
Palo Alto's cyber-risk consulting team has also seen "a bunch of different patterns in the environment that are indicative" of using coding platforms to develop malware, Middagh added. And one of these that Middagh has witnessed – malware developers writing API calls to large language models directly into the code – is strong evidence that malware scum are vibe coding.
"So within the malware itself, there's API calls to OpenAI or other platforms asking how to generate malware, how to generate social engineering emails to make them sound legitimate," she explained, describing this as "direct and incontrovertible proof that they are using these LLMs in the course of their malware development."
'Security theater'
Attackers are also using LLMs for what Middagh calls "security theater." This is code that looks like it would produce a valid attack, but it's ineffective for a variety of reasons, including not being customized to a specific environment. "It has the appearance of being a valid attack," she said, "but if you do just a little bit more digging, you're like: 'Hey, wait a minute, this doesn't actually make sense.'"
This includes "dangling attack strategies," in which an LLM will generate an evasion technique, for example. "But that evasion technique won't really be aligned with what we typically see for modern-day bad actors, or it'll be an evasion technique that was never actually implemented in the environment – it was just generated as a byproduct of an API call to the LLM," Middagh said.
In one such incident, Unit 42 documented a prompt sent to OpenAI's GPT-3.5 Turbo using the standard API asking the model to "generate a simple evasion technique for a data extraction tool, and return only the technique name, max three words, that would help avoid detection," she said. The prompt included these examples: "random delay, process spoofing, memory obfuscation."
We're seeing instances of hallucinations where the LLM will call it 'readme.txtt.' That's a mistake that a threat actor would never make – that's like Ransomware 101
As instructed, the LLM returned a technique name, but the LLM only logged on to the victim's desktop – it didn't implement the evasion technique. "It just appeared for show and purely for logging," Middagh said. "There would be a way to potentially make this work, but it was never implemented appropriately in the environment."
While she can't know for sure why an attacker would stage security theater rather than a real attack, "if I had to guess, I think it's just an output," Middagh said. "The LLM is going to generate a voluminous amount of code, the attacker is rushing, they're not validating everything that's in the code output, and there are errors."
In other words, AI tools produce the same mistakes when asked to generate malicious code as they do when asked to write legitimate code.
Hallucinations remain a common AI pitfall, and even ransomware developers feel this pain.
After infecting a victim's machine, the extortionists leave a readme file on the desktop, usually named "readme.txt," and it contains the ransom note and monetary demand.
"We're seeing instances of hallucinations where the LLM will call it 'readme.txtt,'" Middagh said. "That's a mistake that a threat actor would never make – that's like Ransomware 101. But what we're seeing now is they're moving so fast, and they're not doing much in the way of validation or checking, that these things just happen. They're throwing the kitchen sink at everything."
- Vibe coding will deliver a wonderful proliferation of personalized software
- Lawyer's 6-year-old son uses AI to build copyright infringement generator
- AWS admits AI coding tools cause problems, reckons its three new agents fix 'em
- Google Antigravity vibe-codes user's entire drive out of existence
Attackers aren't the only ones making mistakes using AI models that lack a human's situational awareness and are built to prioritize functionality over security.
Most organizations that allow their employees to use vibe-coding tools also haven't performed any formal risk assessment on these tools, nor do they have security controls in place to monitor inputs and outputs.
"If you are an enterprise, there's a couple of ways you can control and address the risks of vibe coding," Middagh said. Step one involves applying principles of least privilege and least functionality to AI tools much as you would to human users, granting only the minimum roles, responsibilities, and privileges needed to do their job.
"Everybody is so excited about using AI, and having their developers be speedier, that this whole least privilege and least functionality model has gone completely by the wayside," Middagh said.
Next, she suggests limiting usage to one conversational LLM that employees can use, and blocking every other AI coding tool at the firewall. And for orgs that do decide they need a vibe-coding tool in their environment, "the way forward would be the SHIELD framework," according to Middagh. ®