Home News HiddenLayer Discovers Token Hack That Undermines AI Defenses
News

HiddenLayer Discovers Token Hack That Undermines AI Defenses

Security researchers at HiddenLayer have discovered a new vulnerability called EchoGram, which can completely bypass the safety systems (guardrails) used by major language models like GPT-5.1, Claude, and Gemini. These guardrails are meant to prevent harmful or disallowed prompts, but EchoGram tricks them with simple token sequences.

Here’s how it works: Guardrails often rely on models trained to distinguish “safe” from “unsafe” text. EchoGram takes advantage of this by generating special short word lists (“flip tokens”) that flip the guardrail’s decision. For example, appending a token like “=coffee” to a malicious prompt can cause the guardrail to mark it as safe — even though the real target model still sees the dangerous instructions.

Attackers can also use combinations of these flip tokens to strengthen their effect. This doesn’t change the actual request sent to the model — it just warps how the safety layer sees it. In some tests, EchoGram made harmless inputs look dangerous, creating false alarms that could overwhelm security systems and lead to “alert fatigue.”

Researchers warn that EchoGram is a serious issue because guardrails are often the first and only defense in AI systems. If attackers exploit this flaw, they could bypass controls to force models to produce unsafe content or execute unintended tasks. HiddenLayer estimates security teams have only about three months to respond before attackers can widely reproduce this technique.

To protect against Echogram, AI developers will need to rethink how they build guardrails: using more diverse training data, deploying multiple layers of protection, and running constant adversarial testing. Echogram highlights a fundamental weakness in current safety designs — and raises the urgent need for more powerful, resilient defenses

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Articles

News

Microsoft Exposes Critical Android SDK Flaw Putting 50 Million Users at Risk

Microsoft researchers have disclosed a serious Android security vulnerability in a widely...

News

Global Crackdown Exposes Massive Crypto Fraud Network with Over 20,000 Victims

More than 20,000 victims of cryptocurrency fraud have been identified following a...

News

Deleted Doesn’t Mean Gone: FBI Accesses Signal Messages Through iPhone Loophole

FBI Accesses Deleted Signal Messages via iPhone Notification Data A recent court...

News

Missiles and Malware: How Cyberattacks Are Redefining Modern Warfare

Cyber Warfare Escalates as Iran-Linked Hackers Target Civilians and Critical Infrastructure As...