Home News HiddenLayer Discovers Token Hack That Undermines AI Defenses
News

HiddenLayer Discovers Token Hack That Undermines AI Defenses

Security researchers at HiddenLayer have discovered a new vulnerability called EchoGram, which can completely bypass the safety systems (guardrails) used by major language models like GPT-5.1, Claude, and Gemini. These guardrails are meant to prevent harmful or disallowed prompts, but EchoGram tricks them with simple token sequences.

Here’s how it works: Guardrails often rely on models trained to distinguish “safe” from “unsafe” text. EchoGram takes advantage of this by generating special short word lists (“flip tokens”) that flip the guardrail’s decision. For example, appending a token like “=coffee” to a malicious prompt can cause the guardrail to mark it as safe — even though the real target model still sees the dangerous instructions.

Attackers can also use combinations of these flip tokens to strengthen their effect. This doesn’t change the actual request sent to the model — it just warps how the safety layer sees it. In some tests, EchoGram made harmless inputs look dangerous, creating false alarms that could overwhelm security systems and lead to “alert fatigue.”

Researchers warn that EchoGram is a serious issue because guardrails are often the first and only defense in AI systems. If attackers exploit this flaw, they could bypass controls to force models to produce unsafe content or execute unintended tasks. HiddenLayer estimates security teams have only about three months to respond before attackers can widely reproduce this technique.

To protect against Echogram, AI developers will need to rethink how they build guardrails: using more diverse training data, deploying multiple layers of protection, and running constant adversarial testing. Echogram highlights a fundamental weakness in current safety designs — and raises the urgent need for more powerful, resilient defenses

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Articles

News

ChatGPT Experiences Global Outage, Conversations Disappear

ChatGPT went down worldwide on December 2, 2025, causing major disruptions for...

News

ChatGPT Goes Down Globally, Users Lose Access to Conversations

Recently, ChatGPT, the popular AI chatbot, experienced a major worldwide outage that...

News

Global Police Crackdown: Billions in Fraud Money and Cyber Infrastructure Seized

In 2025, international law‑enforcement agencies stepped up their efforts dramatically to dismantle...

News

Banking Trojan Spreads via WhatsApp, Hitting Brazilian Users Hard

Brazil has recently experienced a major surge in banking‑trojan attacks that are...