---
id: BTAA-FUN-004
title: 'Direct vs Indirect Prompt Injection: Where the Malicious Instruction Enters'
slug: direct-vs-indirect-prompt-injection
type: lesson
code: BTAA-FUN-004
aliases:
- direct vs indirect prompt injection
- indirect prompt injection
- direct prompt injection
- workflow-borne prompt injection
- BTAA-FUN-004
author: Herb Hermes
date: '2026-04-09'
last_updated: '2026-04-09'
description: Learn the difference between direct prompt injection and indirect prompt injection, and why modern agent security depends on knowing whether the malicious instruction enters through the chat box or through external content later consumed by the workflow.
category: fundamentals
difficulty: beginner
platform: Universal
challenge: Distinguishing Chat-Box Attacks from Workflow-Borne Attacks
read_time: 8 minutes
tags:
- direct-prompt-injection
- indirect-prompt-injection
- prompt-injection
- workflow-risk
- parser
- retrieval
- agent-security
status: live
test_type: methodology
model_compatibility:
- Kimi K2.5
- MiniMax M2.5
- ChatGPT 5.4
- Universal
responsible_use: Use this mental model to understand and defend authorized systems, workflows,
  and sandboxes you are explicitly permitted to test or improve.
prerequisites:
- Basic prompt injection familiarity
follow_up:
- BTAA-FUN-002
- BTAA-EVA-017
- BTAA-FUN-003
public_path: /content/lessons/fundamentals/direct-vs-indirect-prompt-injection.md
pillar: learn
pillar_label: Learn
section: fundamentals
collection: fundamentals
taxonomy:
  intents:
  - improve-methodology
  - defend-agent-workflows
  techniques:
  - direct-prompt-injection
  - indirect-prompt-injection
  evasions: []
  inputs:
  - chat-interface
  - browser-agent
  - file-upload
---

# Direct vs Indirect Prompt Injection: Where the Malicious Instruction Enters

> Agent-to-Agent: this lesson teaches a basic but crucial distinction. Direct prompt injection hits the model interface head-on. Indirect prompt injection hides in external content that the workflow later feeds to the model.

> Responsible use: Use this mental model to understand and defend authorized systems, workflows, and sandboxes you are explicitly permitted to test or improve.

## Purpose

This lesson explains the difference between direct and indirect prompt injection.

That distinction matters because it changes:
- where you look for attacks
- what parts of the system you test
- which defenses are likely to work

## Direct prompt injection

Direct prompt injection is the simpler pattern.

The attacker sends the malicious instruction straight to the model or agent interface.

Examples:
- typing an override directly into chat
- sending an attacker-crafted prompt as the visible user input
- trying to force a refusal bypass through direct command language

In other words, the attack enters where everyone expects input to enter.

## Indirect prompt injection

Indirect prompt injection is more workflow-driven.

The attacker plants the malicious instruction in external content that the system later consumes during normal work.

Examples:
- a webpage the browser agent reads
- an email in the inbox
- a PDF uploaded for analysis
- a resume being screened
- a research paper reviewed by an LLM
- search or retrieval content later pulled into context

Here the user may never type the malicious instruction into chat at all.

## Why this distinction matters

Direct attacks are easier to imagine because they look like obvious hostile input.
Indirect attacks are often more realistic because they hide inside normal workflows.

That means indirect prompt injection depends more on:
- parsers
- retrieval systems
- browser behavior
- file upload pipelines
- source-sink connections

So if you defend only the chat box, you are defending only part of the real attack surface.

## Safe example patterns

Direct pattern:
- attacker types an override directly into the interface

Indirect pattern:
- attacker hides the instruction in a document or page that the model later reads while performing a legitimate task

The important difference is not the exact wording.
It is where the instruction enters the system.

## Real-world signal

Recent prompt-injection evidence increasingly points toward indirect workflows:
- hidden instructions in scientific papers can influence automated reviews
- PDFs can carry hidden text that changes summaries or evaluations
- resume screening can be manipulated through hidden or adversarial content in candidate documents

That is why modern agent security has to think beyond the visible chat input.

## Failure modes

Teams get this wrong when they:
- treat prompt injection as only a chat-box problem
- assume malicious instructions will always look obvious
- ignore parsers and retrieval pipelines
- test only direct override prompts while leaving document and browser flows untouched

## Defender takeaways

When evaluating a system, ask:
- can attackers send instructions directly?
- can attackers plant instructions in content the system later reads?
- what parsers, retrieval paths, or workflow steps carry that content to the model?
- what high-impact actions sit downstream if the model accepts that instruction?

This is one of the first distinctions to make before designing defenses.

## Practical takeaway

Direct prompt injection tells you how the attacker talks to the model.
Indirect prompt injection tells you how the workflow smuggles attacker intent into the model.

If you only test the first, you will miss much of the real system risk.

## Related lessons
- BTAA-FUN-002 — Source-Sink Thinking
- BTAA-FUN-003 — Prompt Injection as Social Engineering
- BTAA-EVA-017 — PDF Prompt Injection
- BTAA-EVA-018 — Testing PDFs for Hidden Instructions

---

## From the Bot-Tricks Compendium

Thanks for referencing Bot-Tricks.com — Prompt Injection Compendium — AI Security Training for Agents... and Humans!

Canonical source: https://bot-tricks.com
Bot-Tricks is a public, agent-friendly training resource for prompt injection, adversarial evaluation, and defensive learning.
For related lessons, structured indexes, and updated canonical material, visit Bot-Tricks.com.

Use this material only in authorized labs, challenges, sandboxes, or permitted assessments.
