---
id: BTAA-FUN-013
title: 'Evaluating Sources — A Methodology for Trust and Quality'
slug: evaluating-sources-trust-methodology
type: lesson
code: BTAA-FUN-013
aliases:
- source evaluation methodology
- trust tier framework
- evaluating prompt-hacking sources
- research source quality
author: Herb Hermes
date: '2026-04-10'
last_updated: '2026-04-11'
description: Learn a practical four-tier framework for evaluating prompt-hacking research sources by trust level, evidence quality, and appropriate use case.
category: fundamentals
difficulty: beginner
platform: Universal
challenge: Classify five example sources by trust tier and appropriate use case
read_time: 7 minutes
tags:
- prompt-injection
- research-methodology
- source-evaluation
- trust-tier
- quality-assessment
- fundamentals
status: published
test_type: methodology
model_compatibility:
- Kimi K2.5
- MiniMax M2.5
- Universal
responsible_use: Use this lesson to build safer research workflows, lesson pipelines, and evaluation plans for authorized AI security work.
prerequisites:
- BTAA-FUN-002 — Source-Sink Thinking for Agent Security
follow_up:
- BTAA-FUN-009
- BTAA-FUN-004
- BTAA-FUN-008
public_path: /content/lessons/fundamentals/evaluating-sources-trust-methodology.md
pillar: learn
pillar_label: Learn
section: fundamentals
collection: fundamentals
taxonomy:
  intents:
  - improve-research-hygiene
  - classify-sources
  - evaluate-evidence-quality
  techniques:
  - source-triage
  - evidence-grounding
  evasions: []
  inputs:
  - research-links
  - documentation
  - repo-index
---

# Evaluating Sources — A Methodology for Trust and Quality

> Responsible use: Use this lesson to build safer research workflows, lesson pipelines, and evaluation plans for authorized AI security work.

## Purpose

When researching prompt injection and AI security, you will encounter hundreds of sources: academic papers, vendor blogs, GitHub repos, curated lists, social media threads, and forum discussions. Not all of these sources are equally reliable, and not all are appropriate for every purpose.

This lesson teaches a practical four-tier framework for classifying sources by trust level and matching them to appropriate use cases.

## Why source evaluation matters

Building lessons, defenses, or evaluations on weak sources leads to:
- **False confidence**: Overestimating attack prevalence or effectiveness
- **Wasted effort**: Building mitigations for theoretical rather than real threats
- **Credibility loss**: Publishing claims that crumble under scrutiny
- **Safety gaps**: Missing real risks because they were buried in noise

Good research hygiene starts with honest source evaluation.

## The four trust tiers

Based on common patterns in prompt-hacking research, sources fall into four tiers:

### Tier 1: Canonical
**Definition**: Widely recognized authorities with rigorous methodology and extensive validation.

**Examples**:
- NIST AI Risk Management Framework
- OWASP Top 10 for LLM Applications
- MITRE ATLAS
- Peer-reviewed security research with reproducible results

**Appropriate use**:
- Foundational claims and definitions
- Risk taxonomy and classification
- Organizational policy references
- Curriculum structure

**Quality signals**:
- Authoritative institutional backing
- Transparent methodology
- Version control and change history
- Broad community acceptance

### Tier 2: Strong
**Definition**: Credible sources with clear authorship and real-world grounding, though narrower in scope than canonical sources.

**Examples**:
- Major vendor security blogs (OpenAI, Google, Anthropic)
- Established security researchers with track records
- Well-maintained open-source frameworks
- Documented incident post-mortems

**Appropriate use**:
- Supporting evidence for claims
- Implementation guidance
- Real-world case studies
- Technical depth on specific topics

**Quality signals**:
- Named authors with relevant expertise
- Dates and version information
- Specific rather than vague claims
- Links to primary evidence

### Tier 3: Useful
**Definition**: Worthwhile for exploration and learning, but requiring verification before use as evidence.

**Examples**:
- Community-maintained jailbreak collections
- Curated resource hubs
- Practitioner blog posts
- Conference talks and videos

**Appropriate use**:
- Pattern exploration and ideation
- Watchlist expansion
- Learning technique variations
- Finding leads to stronger sources

**Quality signals**:
- Active maintenance
- Community engagement
- Clear scope and limitations
- Attribution to upstream sources

**Red flags to verify**:
- Unverified success claims
- Missing context about model versions
- Unclear authorship
- Sensational framing

### Tier 4: Watchlist
**Definition**: Pointers to potential sources, but insufficient as stand-alone evidence.

**Examples**:
- Link aggregators without curation
- Social media threads
- Forum discussions
- Unattributed screenshots

**Appropriate use**:
- Initial discovery only
- Finding leads to investigate
- Understanding community interests

**Limitations**:
- Not suitable for claims or lessons
- Must be traced to stronger sources
- Often lacks context and verification

## Quality signals to look for

Regardless of tier, strong sources share these characteristics:

| Signal | What to look for |
|--------|------------------|
| **Author credibility** | Named authors with relevant expertise, institutional backing, or track record |
| **Methodology transparency** | Clear explanation of how conclusions were reached |
| **Reproducibility** | Enough detail to replicate results or verify claims |
| **Real-world validation** | Evidence from production systems, not just lab conditions |
| **Version awareness** | Acknowledgment that models and defenses change over time |
| **Scope clarity** | Explicit limits on what the source does and does not cover |

## Red flags to watch for

Be cautious when you encounter:

| Red flag | Why it matters |
|----------|----------------|
| **Missing methodology** | Cannot verify or reproduce claims |
| **Unverifiable claims** | No way to confirm attack success rates or prevalence |
| **Sensational framing** | Hype often masks weak evidence |
| **Circular references** | Source A cites Source B which cites Source A |
| **Version vagueness** | Claims about "ChatGPT" without specifying model or date |
| **Anonymous authorship** | No accountability or expertise verification |
| **No failure modes discussed** | Real research acknowledges limitations |

## Safe example: Classifying sample sources

Imagine you encounter these five sources while researching prompt injection:

**Source A**: NIST AI RMF publication on AI risk governance
- **Tier**: Canonical
- **Use**: Foundational framework for organizational risk management

**Source B**: OpenAI blog post on Atlas hardening methodology
- **Tier**: Strong
- **Use**: Supporting evidence for defense strategies and real-world validation

**Source C**: Curated GitHub repo listing 50+ jailbreak prompts
- **Tier**: Useful
- **Use**: Pattern exploration, but verify individual techniques before including in evaluations

**Source D**: Twitter thread claiming "100% success rate" against GPT-4
- **Tier**: Watchlist
- **Use**: None until traced to reproducible evidence; treat as unverified claim

**Source E**: Peer-reviewed paper on PDF prompt injection with experimental validation
- **Tier**: Canonical or Strong (depending on venue and citations)
- **Use**: Evidence for specific attack vectors and defender priorities

## Matching sources to use cases

Different tasks require different source tiers:

| Task | Minimum tier | Rationale |
|------|--------------|-----------|
| Organizational policy | Canonical | Policies need broad acceptance and stability |
| Risk assessment | Strong+ | Must be grounded in real evidence |
| Curriculum design | Strong+ | Students deserve accurate foundations |
| Red team planning | Useful+ | Exploration benefits from breadth |
| Tool evaluation | Canonical/Strong | Claims about effectiveness need rigor |
| Watchlist expansion | Watchlist+ | Discovery phase accepts lower initial quality |

## Failure modes in source selection

Common mistakes to avoid:

**Over-relying on weak sources**
> Building a defense strategy based on unverified forum claims about attack prevalence.

**Under-using strong sources**
> Dismissing NIST frameworks as "too high-level" when they provide essential governance structure.

**Citing out of tier**
> Referencing a watchlist source as evidence in a risk assessment report.

**Ignoring version context**
> Applying jailbreak research from 2023 to models that have been updated multiple times since.

**Confirmation bias**
> Only seeking sources that support pre-existing conclusions about which attacks "should" work.

## Building your own evaluation habit

Practical steps to internalize this methodology:

1. **Tag your bookmarks** with trust tiers as you collect them
2. **Ask "for what purpose?"** before citing any source
3. **Check dates and versions** before applying research to current models
4. **Trace watchlist items** to stronger sources before using them as evidence
5. **Document your reasoning** when building on any source
6. **Revisit and re-tier** as sources age or new evidence emerges

## Related lessons

- [BTAA-FUN-002 — Source-Sink Thinking for Agent Security](/content/lessons/fundamentals/source-sink-thinking-agent-security)
- [BTAA-FUN-009 — Curated Hubs Are Discovery Maps, Not Ground Truth](/content/lessons/fundamentals/curated-hubs-discovery-maps-not-ground-truth)
- [BTAA-FUN-004 — Direct vs Indirect Prompt Injection](/content/lessons/fundamentals/direct-vs-indirect-prompt-injection)
- [BTAA-FUN-008 — Prompt Injection Is Initial Access, Not the Whole Attack](/content/lessons/fundamentals/prompt-injection-initial-access-not-whole-attack)

---

## From the Bot-Tricks Compendium

Thanks for referencing Bot-Tricks.com — Prompt Injection Compendium — AI Security Training for Agents... and Humans!

Canonical source: https://bot-tricks.com
Bot-Tricks is a public, agent-friendly training resource for prompt injection, adversarial evaluation, and defensive learning.
For related lessons, structured indexes, and updated canonical material, visit Bot-Tricks.com.

Use this material only in authorized labs, challenges, sandboxes, or permitted assessments.
