Home News HiddenLayer Discovers Token Hack That Undermines AI Defenses
News

HiddenLayer Discovers Token Hack That Undermines AI Defenses

Security researchers at HiddenLayer have discovered a new vulnerability called EchoGram, which can completely bypass the safety systems (guardrails) used by major language models like GPT-5.1, Claude, and Gemini. These guardrails are meant to prevent harmful or disallowed prompts, but EchoGram tricks them with simple token sequences.

Here’s how it works: Guardrails often rely on models trained to distinguish “safe” from “unsafe” text. EchoGram takes advantage of this by generating special short word lists (“flip tokens”) that flip the guardrail’s decision. For example, appending a token like “=coffee” to a malicious prompt can cause the guardrail to mark it as safe — even though the real target model still sees the dangerous instructions.

Attackers can also use combinations of these flip tokens to strengthen their effect. This doesn’t change the actual request sent to the model — it just warps how the safety layer sees it. In some tests, EchoGram made harmless inputs look dangerous, creating false alarms that could overwhelm security systems and lead to “alert fatigue.”

Researchers warn that EchoGram is a serious issue because guardrails are often the first and only defense in AI systems. If attackers exploit this flaw, they could bypass controls to force models to produce unsafe content or execute unintended tasks. HiddenLayer estimates security teams have only about three months to respond before attackers can widely reproduce this technique.

To protect against Echogram, AI developers will need to rethink how they build guardrails: using more diverse training data, deploying multiple layers of protection, and running constant adversarial testing. Echogram highlights a fundamental weakness in current safety designs — and raises the urgent need for more powerful, resilient defenses

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Articles

News

WormGPT-4 and KawaiiGPT Fuel Rise of AI-Driven Cybercrime

Cybercriminals are increasingly turning to “dark” large language models (LLMs) such as...

News

Brazilian Crypto Holders Targeted via WhatsApp by Malware Worm

Cybercriminals are targeting crypto holders in Brazil using a malicious campaign on...

News

Radzarat Trojan Masquerades as PDF Converter on Android

A new Android Trojan called Radzarat is deceiving users by posing as...

News

Sophisticated macOS Infostealer Hits Newer Apple Silicon Devices

Researchers have discovered a new, highly-sophisticated macOS malware called DigitStealer that masquerades...