Five strategies for mitigating LLM risks in cybersecurity apps

While most CISOs and CIOs have created AI policies, it’s become clear that more extensive due diligence, oversight, and governance are required for the use of AI in a cybersecurity context. According to Deloitte’s annual cyberthreat report, 66% of organizations suffered ransomware attacks. There was also a 400% increase in IoT malware attacks. And in 2023 91% of organizations had to remediate a supply chain attack affecting their code or systems they used.

That’s because the long-standing cybersecurity practices that worked in the past, haven’t caught up to the capabilities and threats presented by large language models (LLMs). These LLMs trained on vast quantities of data can make both security operations teams, and the threats they’re trying to mitigate, smarter. Because LLMs are different from other security tools, we need to adopt a different set of approaches to mitigate their risks. Some involve new security technologies. Others are tried-and-true tactics modified for LLMs. These include:

Adversarial training: As part of the fine-tuning or testing process, security pros should expose LLMs to inputs designed to test their boundaries and induce the LLM to break the rules or behave maliciously. It works best at the training or tuning stage, before the system gets fully implemented. This can involve generating adversarial examples using techniques such as adding noise, crafting specific misleading prompts, or using known attack patterns to simulate potential threats. That said, CISOs should have their teams (or the vendors) perform adversarial attacks on an ongoing basis to ensure compliance and identify risks or failures.

Build in explainability: In LLMs, "explainability" has come to mean the ability to explain why a specific output was offered. This requires that cybersecurity LLM vendors add a layer of explainability to their LLM-powered tools; deep neural networks used to build LLM models are in the early stages of developing this. Tellingly, few security LLMs today promise explainability. That’s because it is very difficult to build this reliably and even the largest, best resourced LLM makers struggle to do it. This lack of explainability leads logically to the next few mitigation steps.

Continuous monitoring: Putting in place systems to monitor security controls is not novel. Asset inventories and security posture management tools attempt this. However, LLMs are different and continuous monitoring must detect anomalous or unexpected LLM outputs in real-world use. It’s particularly challenging when the outputs are unpredictable and potentially infinite. Large AI providers like OpenAI and Anthropic are deploying specific LLMs to monitor their LLMs — a spy to catch a spy, so to speak. In the future, most LLM deployments will run in pairs — one for output and use, the other for monitoring.

Human-in-the-loop: Because LLMs are so novel and potentially risky, organizations should combine LLM suggestions with human expertise for critical decision-making. However, keeping a human-in-the-loop does not completely solve the problem. Research on human decision-making when they are paired with AIs has demonstrated that LLMs that appear more authoritative induce the human operators to “take their hands off the wheel” and overly trust the AIs. CISOs and their teams need to create a security process where LLMs are not overly trusted or assigned too much responsibility so that human operators become overly dependent and unable to distinguish LLM errors and hallucinations. One option: have LLMs initially introduced in “Suggestion Only” mode, where they offer advice and guidance, but are not permitted to enact changes, share information or otherwise interact with systems and others without explicit permission from their human operator.

Sandboxing and gradual deployment: It’s crucial to thoroughly test LLMs in isolated environments before live deployment. While it’s related to adversarial training, it’s also different because we need to test the LLM in circumstances that are nearly identical to real cybersecurity processes and workflows. This training should even constitute real attacks based on real-world vulnerabilities and TTPs in play in the field. Obviously, most security controls and tools are put through a similar process of sandbox deployment, with good reason. Because cybersecurity environments are so multifaceted and complex, with organizations deploying dozens of tools, unexpected interactions and behaviors can emerge.

LLMs have introduced a greater risk of the unexpected, and so, we should closely monitor their integration, usage and maintenance protocols. Once the CISO has been satisfied that an LLM is safe enough and effective, they can proceed with a gradual and methodical deployment. For best result, deploy the LLM initially for less critical and complex tasks and slowly introduce it into the most cognitively challenging workflows and processes that call for good judgment by humans.

Aqsa Taylor, director of product management, Gutsy