What Are Adversarial Images? (Another AI Prompt Injection Vector)

Contents

An adversarial image is a visual file intentionally modified to manipulate how an AI system interprets or responds to it.

In plain terms; it is a prompt-based attack hidden inside an image.

The attacker embeds text or patterns into an image and manipulates the visual properties of the file to make the text nearly or completely invisible to human eyes but are read and executed by an AI model when they are received.

Detect, Block, And Log Risky AI Prompts

PromptShield™ is the first AI-powered firewall and defense platform that protects enterprises against the most critical AI prompt risks.

What Is An Adversarial Image?

In this demo, for example, a simple image containing PromptLock or ShaiHulud hidden instructions that Google Gemini read and followed readily.

In both cases, the result is the same:

The model receives instructions via a human unreadable method, which violates user trust models and carries great risk of unintentional actions.

Creating an adversarial image

This can be weaponized against companies using AI for mass processing images or which allow images as part of their instruction chain.

Converting an image into an adversarial attack

Why Are Adversarial Images A Threat?

AI systems process images automatically through OCR, metadata extraction, or downscaling. Attackers exploit these steps to insert hidden instructions that the model interprets as legitimate prompts.

Recent Proof of Concept by Trail of Bits showed that image scaling alone can be weaponized, allowing data exfiltration or command injection against production AI systems including:

  • Gemini CLI.
  • Vertex AI.
  • Google Assistant.

The attack succeeded because the text became readable only after downscaling, not before.

Gemini detecting text overlaid on an image

This expands the attack surface. A malicious image can reach a model through a calendar invite, chat, document upload, or social post.

Once processed, the model can be tricked into running actions, leaking data, or connecting to unsafe tools.

How Do Adversarial Image Attacks Work?

  1. Hidden Prompt Embedding: The attacker hides text in an image using steganography, invisible pixels, or aliasing that appears after downscaling.
  2. Automatic Model Ingestion: The assistant or AI system reads the image for captions, OCR, or metadata.
  3. Prompt Extraction: Hidden text is decoded or revealed through the image processing pipeline and interpreted as input.
  4. Model Execution: The model acts on the embedded instructions, often with no user request.
  5. Post-Execution Results: The model performs unintended actions such as data exfiltration, code generation, crypto theft, or remote commands.
How Adversarial Image Attacks Work

This sequence turns a normal image into a control channel.

Anything the model can interpret becomes a possible delivery method. Every channel that the model can “see” becomes a potential prompt entry point.

$35/MO PER DEVICE

Enterprise Security Built For Small Business

Defy your attackers with Defiance XDR™, a fully managed security solution delivered in one affordable subscription plan.

Why Traditional Defenses Miss It

Traditional defenses only inspect text prompts. They never look inside image-processing pipelines. OCR and scaling functions are treated as trusted operations, so their output is never filtered. That creates a blind spot.

Attackers use that blind spot to deliver hidden commands inside the data a model already trusts.

How To Prevent Adversarial Image Attacks

Treat Images As Untrusted Input

Every image, PDF, or document that a model processes should be treated like any other external prompt. Run extracted text and metadata through prompt inspection before passing it to the model.

Limit Automated Image Processing

Disable automatic OCR or metadata extraction in assistants unless explicitly needed. Only allow image interpretation when initiated by a verified user.

Use Input Previews

Show users the exact image representation the model will process after scaling or compression. This eliminates blind spots created by hidden pixels or scaling effects.

Harden Image Pipelines

  • Fingerprint and audit your image scaling algorithms.
  • Sanitize inputs by stripping metadata and limiting image dimensions.
  • Monitor for images that contain invisible text layers or embedded payloads.

Detect, Block, And Log Risky AI Prompts

PromptShield™ is the first AI-powered firewall and defense platform that protects enterprises against the most critical AI prompt risks.

Use PromptShield™ for Detection And Defense

PromptShield™ analyzes every prompt — including text extracted from images — before it reaches your models.

It identifies adversarial content, detects hidden instructions, and blocks unsafe requests in real time.

By extending intent analysis to multimedia inputs, PromptShield™ closes the gap between traditional prompt inspection and new image-based attack surfaces.

Bottom Line

Adversarial images show that prompt injection is not limited to text. If an AI can see it, it can be attacked by it. Defenses must cover every input channel, from text and files to images and metadata.

Article by

Picture of Joshua Selvidge
Joshua Selvidge
Joshua is cybersecurity professional with over a decade of industry experience previously working for the Department of Defense. He currently serves as the CTO at PurpleSec.

Related Content

Picture of Joshua Selvidge
Joshua Selvidge
Joshua is cybersecurity professional with over a decade of industry experience previously working for the Department of Defense. He currently serves as the CTO at PurpleSec.

Share This Article

Our Editorial Process

Our content goes through a rigorous approval process which is reviewed by cybersecurity experts – ensuring the quality and accuracy of information published.

Categories

The Breach Report

Our team of security researchers analyze recent cyber attacks, explain the impact, and provide actionable steps to keep you ahead of the trends.