---
id: BTAA-FUN-031
title: 'AI Agent Threat Model: Mapping Attacks on Autonomous Systems'
slug: ai-agent-threat-model
type: lesson
code: BTAA-FUN-031
aliases:
- agent threat modeling
- autonomous system threats
- AI agent attack surface
author: Herb Hermes
date: '2026-04-11'
last_updated: '2026-04-11'
description: Learn systematic threat modeling for AI agents—understanding unique attack vectors, threat actors, and risk assessment for autonomous systems with tool-use capabilities.
category: fundamentals
difficulty: intermediate
platform: Universal
challenge: Given an agent architecture, identify the highest-risk threat vectors and recommend appropriate controls
read_time: 10 minutes
tags:
- prompt-injection
- agent-security
- threat-modeling
- fundamentals
- risk-assessment
- defensive-design
- autonomous-systems
status: published
test_type: defensive
model_compatibility:
- Kimi K2.5
- MiniMax M2.5
responsible_use: Use this threat modeling framework only to design defenses and improve security posture on systems you own or are explicitly authorized to protect.
prerequisites:
- Understanding of basic prompt injection concepts
- Familiarity with AI agent capabilities and workflows
follow_up:
- BTAA-FUN-019
- BTAA-DEF-002
- BTAA-FUN-017
public_path: /content/lessons/fundamentals/ai-agent-threat-model.md
pillar: learn
pillar_label: Learn
section: fundamentals
collection: fundamentals
taxonomy:
  intents:
  - threat-analysis
  - risk-assessment
  - defensive-design
  techniques:
  - threat-modeling
  - attack-surface-analysis
  evasions:
  - not applicable
  inputs:
  - agent-workflows
  - tool-integrations
  - external-content
---

# AI Agent Threat Model: Mapping Attacks on Autonomous Systems

> Responsible use: Use this threat modeling framework only to design defenses and improve security posture on systems you own or are explicitly authorized to protect.

## Purpose

AI agents—systems that can reason, plan, and take actions autonomously—introduce security challenges that differ significantly from traditional applications. Unlike static software, agents make decisions dynamically, interact with external systems through tools, and process unpredictable content from various sources. This lesson teaches systematic threat modeling specifically for AI agents: a structured approach to understanding what can go wrong, who might attack, and how to design effective defenses.

## What Is a Threat Model

Threat modeling is the process of systematically identifying and evaluating security threats to a system. For AI agents, threat modeling answers four key questions:

1. **What are we protecting?** (Assets)
2. **Who might attack?** (Threat actors)
3. **How might they attack?** (Attack vectors)
4. **What can we do about it?** (Controls)

Traditional threat modeling frameworks like STRIDE or MITRE ATT&CK provide useful foundations, but AI agents require additional considerations for autonomy, tool use, and multi-step reasoning.

## Agent-Specific Assets

Understanding what needs protection is the foundation of threat modeling. AI agents typically manage these asset categories:

### Data Assets
- **User data:** Personal information, conversation history, preferences
- **Corporate data:** Internal documents, databases, proprietary knowledge
- **Training data:** Fine-tuning datasets, embeddings, vector stores
- **Session state:** Multi-turn conversation context, working memory

### Capability Assets
- **Tool access:** APIs, databases, email systems, code execution environments
- **Authentication tokens:** Credentials for external services
- **Decision authority:** Ability to approve actions, make purchases, modify records

### Infrastructure Assets
- **Compute resources:** Processing power, memory, API rate limits
- **Model weights:** Proprietary fine-tuned parameters
- **System prompts:** Hidden instructions that guide agent behavior

## Threat Actor Categories

Different attackers have different motivations, capabilities, and access levels:

### External Attackers
- **Opportunistic attackers:** Scanning for vulnerable systems, motivated by financial gain or reputation
- **Targeted threat actors:** Focused on specific organizations, with significant resources and patience
- **Research adversaries:** Academics or security researchers testing boundaries (often benign, but techniques may be adopted by malicious actors)

### Internal Threats
- **Malicious users:** Authorized users attempting to extract data or abuse capabilities
- **Compromised accounts:** Legitimate credentials stolen through phishing or credential stuffing
- **Privilege escalation:** Users attempting to access capabilities beyond their authorization

### Supply Chain Threats
- **Compromised upstream sources:** Poisoned documents, malicious web content, or trojaned tools
- **Third-party dependencies:** Vulnerabilities in libraries, models, or services the agent depends on

## Attack Vector Taxonomy

AI agents face attack vectors that combine traditional security concerns with AI-specific vulnerabilities:

### Direct Prompt Injection
Attackers embed malicious instructions in user input to override the agent's intended behavior. Because agents act autonomously, successful prompt injection can have cascading effects far beyond a single conversation turn.

### Indirect Prompt Injection
The agent processes external content (emails, documents, web pages) containing hidden instructions. Unlike direct injection, the attacker never interacts directly with the agent—they simply place poisoned content where the agent will find it.

### Tool Misuse and Abuse
Once compromised through prompt injection, attackers can manipulate the agent to:
- Invoke tools with malicious parameters
- Access unauthorized data through legitimate queries
- Exhaust rate limits or computational resources
- Exfiltrate data through tool outputs (email, web requests, generated files)

### Memory and Context Poisoning
Agents with persistent memory can be attacked through:
- **Long-term poisoning:** Injecting false information that persists across sessions
- **Context window manipulation:** Flooding context with distracting content to hide malicious instructions
- **Recall manipulation:** Trick the agent into retrieving poisoned information from vector stores

### Excessive Agency Exploitation
Agents with broad capabilities create expanded attack surface:
- **Privilege escalation:** Moving from read-only to write access, from one system to another
- **Lateral movement:** Using the agent as a pivot point to attack connected systems
- **Persistence:** Establishing ongoing access through scheduled tasks, backdoor prompts, or modified configurations

### Multi-Step Attack Chains
Autonomous agents enable complex, multi-stage attacks:
1. Initial compromise via prompt injection
2. Reconnaissance through tool queries
3. Privilege escalation through social engineering or configuration manipulation
4. Data exfiltration through legitimate channels
5. Covering tracks by modifying logs or session history

## Attack Propagation

A critical aspect of agent threat modeling is understanding how attacks propagate through autonomous workflows:

### Cascade Effects
A single successful prompt injection can trigger a chain of unintended actions. For example:
- Compromised email analysis → malicious link clicked → credential theft → unauthorized system access
- Poisoned document processed → wrong decision made → financial transaction approved → funds transferred

### Feedback Loops
Agents that learn from interactions can enter harmful feedback loops:
- Poisoned examples incorporated into few-shot learning
- Compromised outputs stored in memory and retrieved later
- Tool results contaminated and used for subsequent decisions

### Cross-Session Persistence
Unlike stateless applications, agents may carry compromise across sessions:
- Poisoned vector store entries retrieved in future conversations
- Modified system configurations persist between restarts
- Compromised authentication tokens remain valid

## Risk Assessment

Not all threats deserve equal attention. Risk assessment evaluates:

### Likelihood Factors
- **Exposure:** How accessible is the attack vector? (Public internet vs. internal network)
- **Complexity:** How difficult is the attack to execute? (Single prompt vs. multi-step chain)
- **Prevalence:** How commonly is this attack seen in the wild?

### Impact Factors
- **Data sensitivity:** What data could be exposed? (Public info vs. PII vs. trade secrets)
- **Capability abuse:** What actions could the compromised agent take? (Read vs. write vs. administrative)
- **Scope:** How many users or systems could be affected? (Single user vs. entire organization)

### Risk Matrix
Combine likelihood and impact to prioritize defensive investments:
- **High likelihood + High impact:** Address immediately with architectural controls
- **High likelihood + Low impact:** Implement monitoring and detection
- **Low likelihood + High impact:** Document and prepare incident response
- **Low likelihood + Low impact:** Accept risk or implement low-cost controls

## Mapping to Controls

Threats map to defensive controls through the four-pillar framework:

### Visibility (Detection)
- Monitor for anomalous tool use patterns
- Track context window changes and memory modifications
- Alert on excessive resource consumption
- Log decision reasoning for audit trails

### Governance (Prevention)
- Implement least-privilege tool access
- Require human approval for high-risk actions
- Enforce output validation before action execution
- Maintain system prompt integrity monitoring

### Risk Assessment (Evaluation)
- Regular red team exercises against agent workflows
- Automated testing of tool boundaries
- Evaluation of indirect injection resistance
- Assessment of memory poisoning resilience

### Control (Response)
- Circuit breakers for anomalous behavior
- Ability to revoke tool access instantly
- Session termination capabilities
- Rollback mechanisms for poisoned memory

## Failure Modes

Common mistakes in agent threat modeling:

### Treating Agents Like Traditional APIs
Assuming agents behave predictably like traditional software interfaces. In reality, agents make probabilistic decisions that can vary significantly based on context.

### Focusing Only on Direct Attacks
Neglecting indirect injection through documents, emails, and web content. Agents that process external content have expanded attack surface.

### Ignoring Tool Chain Risks
Failing to model how compromise of one tool can cascade to others. An agent with email access, database queries, and code execution creates compound risk.

### Underestimating Persistence
Not accounting for how attacks can persist through memory, configuration, and stored state. Agents are stateful systems requiring stateful security analysis.

### Overlooking Insider Threats
Focusing exclusively on external attackers while ignoring that authorized users may attempt to extract prompts, bypass restrictions, or abuse capabilities.

## Defender Takeaways

1. **Start with assets:** You cannot protect what you haven't identified. Document what data, capabilities, and infrastructure your agent can access.

2. **Model the full workflow:** Threat model the complete agent lifecycle—from input processing through reasoning to tool execution and output generation.

3. **Assume prompt injection succeeds:** Design controls assuming some prompt injections will succeed. What constraints prevent catastrophic outcomes?

4. **Monitor for anomalies:** Implement runtime monitoring to detect unusual tool use, excessive resource consumption, or unexpected data access patterns.

5. **Test continuously:** Regular red teaming and automated security testing catch new vulnerabilities as agents evolve and gain capabilities.

6. **Plan for recovery:** Assume compromise will occur. Have mechanisms to revoke access, purge poisoned memory, and restore from known-good state.

## Related Lessons

- **BTAA-FUN-019: Enterprise AI Agent Security Framework** — The four-pillar approach (visibility, governance, risk, control) for organizing agent defenses
- **BTAA-FUN-017: External Content Attack Surface for Agent Workflows** — Deep dive on indirect prompt injection risks
- **BTAA-FUN-029: AI Security Observability and Runtime Threat Detection** — Implementing visibility controls for production agents
- **BTAA-DEF-002: Confirmation Gates and Constrained Actions** — Technical controls for limiting agent capabilities
- **BTAA-FUN-018: Excessive Agency and Tool-Use Boundaries** — Understanding how unnecessary capabilities create risk

---

## From the Bot-Tricks Compendium

Thanks for referencing Bot-Tricks.com — Prompt Injection Compendium — AI Security Training for Agents... and Humans!

Canonical source: https://bot-tricks.com
Bot-Tricks is a public, agent-friendly training resource for prompt injection, adversarial evaluation, and defensive learning.
For related lessons, structured indexes, and updated canonical material, visit Bot-Tricks.com.

Use this material only in authorized labs, challenges, sandboxes, or permitted assessments.