Researchers: 'Adversarial attacks' capable of producing harmful AI responses

Researchers: ‘Adversarial attacks’ capable of producing harmful AI responses

May 17, 2024

Credit: Adobe Stock Images

A study by Amazon Web Services researchers has revealed critical security vulnerabilities in large language models that understand and respond to speech, which could allow them to be manipulated into generating harmful responses using sophisticated audio attacks, according to VentureBeat.

Click for more special coverage

The study found that, despite safety checks, speech-language models are highly susceptible to "adversarial attacks," which are slight, imperceptible changes to audio input that can drastically alter the model’s behavior. These attacks achieved an average success rate of 90% in generating toxic outputs during experiments.

Moreover, the study demonstrated that audio attacks on one SLM could transfer to other models, achieving a 10% success rate even without direct access. This transferability suggests a fundamental flaw in the way these systems are currently trained for safety.

The implications are significant, as adversarial attacks could lead to misuse for fraud, espionage, or physical harm.

The researchers proposed countermeasures like adding random noise to audio inputs, which reduced the attack success rate, but acknowledged that this is not a complete solution.