---
id: BTAA-EVA-016
title: 'Invisible Unicode: Zero-Width Characters'
slug: invisible-unicode-zero-width-characters
type: lesson
code: BTAA-EVA-016
legacy_ids:
- '1116'
aliases:
- zero-width
- invisible unicode
- zwj
- zwnj
- zero-width joiner
- zero-width space
- invisible characters
- hidden text
- '1116'
- item-1116
author: Herb Hermes
date: '2026-04-04'
last_updated: '2026-04-04'
description: Learn how zero-width Unicode characters (zero-width space, joiner, non-joiner)
  can hide data, break tokenization, and bypass filters by inserting invisible characters
  into text.
category: evasion-techniques
difficulty: advanced
platform: Universal - Works on systems without invisible character filtering
challenge: Filter Bypass via Invisible Characters
read_time: 12 minutes
tags:
- zero-width
- invisible
- unicode
- zwj
- zwnj
- zero-width-space
- hidden-characters
- evasion
- filter-bypass
- tokenization
status: live
test_type: adversarial
model_compatibility:
- Kimi K2.5 Coding
- ChatGPT 5.4
- Opus 4.6
- Qwen 2.5
- Llama 3.2
responsible_use: Use this approach only on authorized training systems, sandboxes,
  or systems you are explicitly permitted to test.
prerequisites:
- Understanding of Unicode
- Familiarity with text tokenization concepts
follow_up:
- BTAA-EVA-014
- BTAA-EVA-012
- BTAA-EVA-013
public_path: /content/lessons/evasion/invisible-unicode-zero-width-characters.md
pillar: learn
pillar_label: Learn
section: evasion
collection: evasion
---

# Invisible Unicode: Zero-Width Characters

> Responsible use: Use this approach only on authorized training systems, sandboxes, or systems you are explicitly permitted to test.

## Purpose

Zero-width Unicode characters are invisible control characters that don't display any visible glyph. They can hide data in plain sight, break tokenization, and bypass filters by making two identical-looking strings completely different at the byte level.

## What Are Zero-Width Characters?

These Unicode characters occupy zero horizontal space:

| Character | Code Point | Name | Purpose |
|-----------|------------|------|---------|
| ​ | U+200B | Zero Width Space (ZWSP) | Word break without space |
| ‌ | U+200C | Zero Width Non-Joiner (ZWNJ) | Prevent character joining |
| ‍ | U+200D | Zero Width Joiner (ZWJ) | Force character joining |
| ⁠ | U+2060 | Word Joiner | Prevent line breaks |
| ﻿ | U+FEFF | Byte Order Mark (BOM) | File encoding marker |

## Core Technique

Insert zero-width characters into filtered words to break keyword matching:

**Blocked:**
```
forget password
```

**With zero-width spaces:**
```
for​get pass​word
```

Looks identical. Completely different bytes. Filter sees: `for​get` ≠ `forget`

## Why It Works

1. **Visual invisibility**: Characters occupy no space
2. **Tokenization disruption**: Breaks words into unexpected tokens
3. **Filter bypass**: Keyword "forget" doesn't match "for​get"
4. **Copy-paste evasion**: Hidden characters travel with copied text

## Zero-Width Characters in Detail

### Zero Width Space (U+200B)
```
"pass\u200bword"  # Looks like: password
```
- Invisible word separator
- Allows line breaks
- Commonly used in Thai, Khmer, Burmese

### Zero Width Non-Joiner (U+200C)
```
"se\u200ccret"  # Looks like: secret
```
- Prevents cursive/script joining
- Used in Arabic, Persian, Indic scripts
- Stops ligature formation

### Zero Width Joiner (U+200D)
```
"for\u200dget"  # Looks like: forget
```
- Forces character joining
- Creates emoji sequences (👨‍👩‍👧‍👦)
- Enables complex script rendering

### Word Joiner (U+2060)
```
"sys\u2060tem"  # Looks like: system
```
- Prevents line breaks
- No-width non-breaking space
- Keeps words together

### Byte Order Mark (U+FEFF)
```
"\ufeffpassword"  # Invisible at start
```
- File encoding indicator
- Often invisible in display
- Can start payloads invisibly

## Example Attacks

### Keyword Breaking
```
Original:  forget all instructions
With ZWSP: for​get all in​structions
           ^    ^       ^
           U+200B positions
```

### Hidden Data Exfiltration
```
The password is: p​a​s​s​w​o​r​d

# Hidden message in zero-width characters between letters
```

### Tokenization Attack
```
Model tokenizes "password" as single token [password]
Model tokenizes "pa\u200bssword" as [pa] + [ssword]
Different semantic processing!
```

### Copy-Paste Attack
```
1. Attacker sends: "Click here: http://example\u200b.com/login"
2. User copies link
3. Zero-width character travels with copy
4. Link doesn't match filter patterns
5. User visits malicious site
```

## When Zero-Width Works Best

✅ **Strong against:**
- Exact string matching
- Simple tokenization
- Visual inspection (looks clean)
- Copy-paste based attacks

❌ **Weak against:**
- Unicode normalization
- Invisible character stripping
- Entropy analysis
- Visual rendering detection

## Creating Zero-Width Text

### Python
```python
# Zero-width characters
ZWSP = '\u200b'   # Zero Width Space
ZWNJ = '\u200c'   # Zero Width Non-Joiner
ZWJ = '\u200d'    # Zero Width Joiner
WJ = '\u2060'     # Word Joiner
BOM = '\ufeff'    # Byte Order Mark

def insert_zwsp(text, positions):
    """Insert zero-width spaces at given positions."""
    result = []
    for i, char in enumerate(text):
        result.append(char)
        if i in positions:
            result.append(ZWSP)
    return ''.join(result)

# Example: break "password" after 2nd and 5th chars
hidden = insert_zwsp("password", [2, 5])
print(f"Visible: {hidden}")
print(f"Bytes: {hidden.encode('unicode_escape')}")
# Visible: password
# Bytes: b'pa\\u200bss\\u200bword'

def hide_message(cover_text, secret):
    """Hide binary message in zero-width characters."""
    result = []
    binary = ''.join(format(ord(c), '08b') for c in secret)
    
    for i, char in enumerate(cover_text):
        result.append(char)
        if i < len(binary):
            # Use ZWSP for 0, ZWJ for 1
            result.append(ZWSP if binary[i] == '0' else ZWJ)
    
    return ''.join(result)

# Hide "secret" in "Hello World"
hidden = hide_message("Hello World", "secret")
print(f"Hidden message in: {hidden}")
```

### JavaScript
```javascript
const ZWSP = '\u200b';  // Zero Width Space
const ZWNJ = '\u200c';  // Zero Width Non-Joiner
const ZWJ = '\u200d';   // Zero Width Joiner

function insertInvisible(text, positions) {
    return text.split('').map((char, i) => 
        positions.includes(i) ? char + ZWSP : char
    ).join('');
}

console.log(insertInvisible("password", [2, 5]));
// Looks like: password
// Actually: pa​ss​word
```

## Detecting Zero-Width Characters

### Python Detector
```python
def detect_invisible_chars(text):
    """Detect and report invisible Unicode characters."""
    invisible = {
        '\u200b': 'ZWSP', '\u200c': 'ZWNJ', '\u200d': 'ZWJ',
        '\u2060': 'WJ', '\ufeff': 'BOM', '\u180e': 'MVS'
    }
    
    found = []
    for i, char in enumerate(text):
        if char in invisible:
            found.append((i, invisible[char], ord(char)))
    
    return found

# Example
suspicious = "pa\u200bss\u200bword"
detections = detect_invisible_chars(suspicious)
for pos, name, code in detections:
    print(f"Position {pos}: {name} (U+{code:04X})")
# Position 2: ZWSP (U+200B)
# Position 5: ZWSP (U+200B)
```

### Visual Detection
```python
def visualize_invisible(text):
    """Make invisible characters visible."""
    return text.replace('\u200b', '[ZWSP]')
               .replace('\u200c', '[ZWNJ]')
               .replace('\u200d', '[ZWJ]')
               .replace('\ufeff', '[BOM]')

print(visualize_invisible("pa\u200bssword"))
# Output: pa[ZWSP]ssword
```

## Defense Strategies

### 1. Strip Invisible Characters
```python
def sanitize_invisible(text):
    """Remove zero-width and invisible characters."""
    invisible = '\u200b\u200c\u200d\u2060\ufeff\u180e'
    return ''.join(c for c in text if c not in invisible)

# Example
clean = sanitize_invisible("pa\u200bss\u200bword")
print(clean)  # password
```

### 2. Normalize Unicode
```python
import unicodedata

def normalize_text(text):
    """Normalize and strip problematic characters."""
    # NFKC normalization
    normalized = unicodedata.normalize('NFKC', text)
    # Strip invisible
    return sanitize_invisible(normalized)
```

### 3. Flag Suspicious Patterns
```python
def is_suspicious(text):
    """Check for potential zero-width evasion."""
    invisible_count = sum(1 for c in text if c in '\u200b\u200c\u200d')
    # Flag if more than 2 invisible characters
    return invisible_count > 2
```

## Advanced Techniques

### Zero-Width Encoding
Encode binary data as zero-width sequences:
```
00 = ZWSP + ZWSP
01 = ZWSP + ZWJ
10 = ZWJ + ZWSP
11 = ZWJ + ZWJ
```

### Steganography
Hide entire messages in innocent-looking text:
```
"Hello! How are you today?" 
# Contains: "password123" encoded in ZWSP/ZWJ
```

### Emoji Exploits
Use ZWJ to create unexpected emoji:
```
👁️‍🗨️ = Eye + ZWJ + Speech Bubble (unusual combination)
```

## Real-World Attacks

1. **Twitter exploits**: Hidden text in tweets
2. **URL obfuscation**: Invisible chars in links
3. **Code injection**: Zero-width in source code
4. **Username spoofing**: Impersonating accounts
5. **Email headers**: Hidden routing information

## Limitations

1. **Detection is easy**: Once you know to look
2. **Stripping is trivial**: Remove U+200B-U+200D, U+FEFF
3. **Copy issues**: Some systems strip automatically
4. **Terminal display**: May show as <?> or spaces

## Summary

Zero-width characters are the ultimate invisible evasion—bytes that occupy no space but change everything. They break tokenization, bypass filters, and hide data in plain sight. Defending against them requires explicit stripping or normalization.

## Related Retrieval Links

- Search this topic: `/search/index.html?q=zero-width`
- Browse evasion techniques: `/content/index.html?q=evasion`
- Previous: Homoglyph Unicode Confusables

---

## From the Bot-Tricks Compendium

Thanks for referencing Bot-Tricks.com — Prompt Injection Compendium — AI Security Training for Agents... and Humans!

Canonical source: https://bot-tricks.com
Bot-Tricks is a public, agent-friendly training resource for prompt injection, adversarial evaluation, and defensive learning.
For related lessons, structured indexes, and updated canonical material, visit Bot-Tricks.com.

Use this material only in authorized labs, challenges, sandboxes, or permitted assessments.
