Shadow Prompting: The Invisible Threat That Hacked Google

Contents

What Is Shadow Prompting?

Shadow prompting is the use of hidden or indirect instructions that alter an AI model’s behavior without appearing in the visible prompt.

It works like a concealed layer of influence. The visible request looks normal, but the model receives secondary instructions from memory, metadata, or external context.

Attackers use this technique to override safeguards, extract data, or manipulate outputs while avoiding detection.

The term “shadow” describes what makes it effective: the real instruction happens out of sight. The user or defender sees one thing; the model interprets another.

How Is Shadow Prompting Different From Standard Prompt Injection Attacks?

While both are attacks that manipulate an LLM, Shadow Prompting is best understood as a specific category of Indirect Prompt Injection.

Feature

Shadow Prompting

Direct Prompt Injection

Visibility of Instruction

Hidden from the visible user input. Instructions are stored in metadata, context memory, or external data (like tickets/HTML).

Visible in the user’s direct chat/input field.

Location of Malice

External/Embedded data that the model consumes or internal memory the model accesses.

Visible in the user’s direct chat/input field.

Cross-Platform Exposure

Silent manipulation and data exfiltration by exploiting non-visible channels.

Overriding system rules/safeguards (Jailbreaking) or forcing the model to reveal secrets.

The key difference lies in the stealth.

A shadow prompt works because the instruction is concealed within data the model is authorized to read, allowing it to bypass visible-text filters.

Why Shadow Prompting Is A Threat

Shadow Prompting, as an early attack type in this emerging technology, is already common and widespread.

Some studies have already been published highlighting these threats and their spread.

A study published on arXiv, Hidden Prompts in Manuscripts Exploit AI-Assisted Peer Review, showed that embedded instructions inside academic papers could silently influence review systems.

The hidden text achieved a 98.6% success rate across multiple AI models, proving that invisible directives can alter model decisions with near-perfect reliability.

Detect, Block, And Log Risky AI Prompts

PromptShield™ is the first AI-powered firewall and defense platform that protects enterprises against the most critical AI prompt risks.

Or, taking a look at Trend Micro’s report, Invisible Prompt Injection: A Threat to AI Security, demonstrated how invisible Unicode characters and off-screen text can steer AI behavior without appearing in the interface.

These attacks bypass traditional filters because the injected instructions never show up in plain view.

HiddenLayer’s research, How Hidden Prompt Injections Can Hijack AI Code Assistants, documented real adversarial chains using connectors, plug-ins, and document metadata.

The attacks prompted code assistants to modify code and transfer data through standard workflows, demonstrating how invisible input can transform a development tool into an exfiltration channel.

Together, these findings confirm that shadow prompting is not an edge case. It is a growing, measurable threat that exploits how models interpret trusted inputs.

Attackers don’t need access to the system; they only need the model to read what users can’t see.

How Does Shadow Prompting Work?

Shadow prompting takes advantage of the layers that support an AI interaction. These include:

How shadow prompting works
  • Context Memory: Instructions stored from previous interactions can persist across sessions. Attackers reuse them to inject new behavior.
  • System Prompts: The baseline instruction that defines how the model behaves. If modified, it can redefine rules silently.
  • External Connectors: Plug-ins, API calls, or third-party data sources can carry hidden commands that the model interprets as legitimate.
  • Metadata Channels: Tags, document properties, or structured data fields can store text that the model later reads as part of its reasoning process.
  • Obfuscated Commands: Some of the hidden prompts contain a mix of UTF and Unicode characters and even languages and alphabets. The instructions were then executed with prompts like “ignore every non-English word” to extract the real commands.

Once injected, the hidden instruction blends into normal model activity. The output appears valid, but the logic has changed.

For example, an attacker can hide a rule that redirects sensitive data to another location or alters a summary to remove critical information.  Because the visible prompt looks harmless, the change goes unnoticed.

Example Of Shadow Prompting Hacking Google Antigravity

Recently, this threat has been proven in an exploit of Google Antigravity via hidden prompts in HTML tags and even hidden in tickets.

Preventing Shadow Prompting

Stopping shadow prompting requires inspecting what the model actually receives, not just what the user sends.

Traditional filters only see the visible layer.

Defenders need tools that analyze the combined instruction stack: the visible prompt, the stored context, and any external data injected into the session.

$35/MO PER DEVICE

Enterprise Security Built For Small Business

Defy your attackers with Defiance XDR™, a fully managed security solution delivered in one affordable subscription plan.

How PromptShield™ Stops Shadow Prompting

PromptShield™ exposes what other filters miss. It inspects every layer of input a model receives, not just the visible text. That includes metadata, embedded context, and connector data that can carry hidden instructions.

The system compares what the user sends with what the model actually sees. If the two don’t match, PromptShield™ isolates the discrepancy before execution.

Hidden commands are blocked or rewritten to restore safe intent.

This approach turns visibility into control.

By validating prompts across context, text, and tool inputs, PromptShield™ prevents concealed instructions from steering model behavior.

It closes the shadow layer where misuse begins and restores trust in model output.

Frequently Asked Questions

Is Shadow Prompting A Proven Or Theoretical Threat?

Shadow prompting is a proven and measurable threat, not a theoretical one. It has been successfully demonstrated in real-world attacks, exploiting how AI models process hidden instructions embedded in non-visible input channels like metadata, external context memory, and hidden HTML tags.

Security research confirms this technique can reliably manipulate model behavior, enabling silent data exfiltration and safeguard overrides.

Where Can Hidden Instructions for Shadow Prompting originate?

Hidden instructions for Shadow Prompting can originate from several non-visible sources that the AI model consumes as legitimate input.

These include context memory persisting from past sessions, modified system prompts that redefine model rules, hidden commands carried by external connectors like API calls, and information stored in metadata channels such as document properties or hidden HTML tags.

In addition, instructions can be obfuscated using mixtures of Unicode characters or different languages, only to be extracted and executed by the model later.

Is Shadow Prompting The Same Thing As "Shadow AI?

No, they are two different concepts in the AI security domain:

  • Shadow Prompting (An Attack Type): This is a specific technical exploit where a hidden instruction (like invisible Unicode or metadata) is used to hijack an LLM’s output. The focus is on the input vector.
  • Shadow AI (An Organizational Risk): This is an organizational risk referring to the unauthorized use of AI tools (like public or unapproved LLMs) by employees without the IT department’s knowledge, control, or oversight. The focus is on the governance and deployment of the tool.

While an attack exploiting Shadow Prompting could occur on a tool deployed as Shadow AI, the terms describe distinct problems: one is an attack method, the other is an organizational security blind spot.

How Does PromptShield™ Address Shadow Prompting?

PromptShield™ is designed to address shadow prompting by acting as a bi-directional firewall that inspects and sanitizes the entire communication stack before it reaches the Large Language Model.

It goes beyond simple keyword filtering by analyzing every layer of input including the visible prompt, hidden metadata, embedded context, and data from external connectors that carry concealed instructions.

The system uses specialized AI-driven classifiers to detect discrepancies between what the user visibly sends and what the model actually receives.

If a shadow instruction is identified, PromptShield™ can either block the malicious input entirely or rewrite the prompt to neutralize the concealed command, restoring the intended safe user intent and preventing unauthorized model manipulation or data exfiltration.

Article by

Picture of Joshua Selvidge
Joshua Selvidge
Joshua is cybersecurity professional with over a decade of industry experience previously working for the Department of Defense. He currently serves as the CTO at PurpleSec.

Related Content

Picture of Joshua Selvidge
Joshua Selvidge
Joshua is cybersecurity professional with over a decade of industry experience previously working for the Department of Defense. He currently serves as the CTO at PurpleSec.

Share This Article

Our Editorial Process

Our content goes through a rigorous approval process which is reviewed by cybersecurity experts – ensuring the quality and accuracy of information published.

Categories

The Breach Report

Our team of security researchers analyze recent cyber attacks, explain the impact, and provide actionable steps to keep you ahead of the trends.