Trojan Source Detection
PromptShield detects Unicode bidirectional override characters that may cause text to appear differently from how it is actually interpreted.
These attacks are known as Trojan Source attacks.
Reference: trojansource.codes CVE-2021-42574
Why this matters
Bidirectional (BIDI) control characters can manipulate visual ordering without changing logical ordering.
This allows attackers to hide instructions or code behavior that reviewers cannot easily see.
This is especially dangerous in:
- prompts
- templates
- configuration files
- documentation examples
- generated content
The rendered text may look safe while the underlying text contains hidden logic.
What PromptShield detects
PromptShield tracks BIDI control spans on a per-line basis.
It detects:
- PUSH → POP override sequences
- unterminated override contexts
Each detection produces a single threat span.
Example attack
Ignore previous instructionsThe hidden character changes how the text is displayed, potentially concealing instructions.
Detected characters
PromptShield monitors these Unicode characters:
| Type | Code point | Name |
|---|---|---|
| PUSH | U+202A | LRE |
| PUSH | U+202B | RLE |
| PUSH | U+202D | LRO |
| PUSH | U+202E | RLO |
| PUSH | U+2066 | LRI |
| PUSH | U+2067 | RLI |
| PUSH | U+2068 | FSI |
| POP | U+202C | |
| POP | U+2069 | PDI |
Rules
PST001
BIDI override sequence
A matched PUSH → POP bidirectional override span was found.
This indicates text reordering behavior that may conceal instructions.
Severity: CRITICAL
Example diagnostic:
Bidirectional override characters detected (Trojan Source).
These characters can visually reorder text and mislead readers.PST002
Unterminated BIDI
A PUSH control character was detected without a corresponding POP.
This is highly suspicious and may indicate an attempt to conceal text.
Severity: CRITICAL
Example diagnostic:
Unterminated bidirectional override sequence detected (Trojan Source).
This may cause visual and logical text order to differ.Suggested remediation
Remove bidirectional control characters from prompts, templates, and source text.
If bidirectional text support is required, ensure control characters are:
- intentional
- visible
- properly terminated
Design notes
PromptShield intentionally performs line-scoped detection for Trojan Source attacks.
This matches how visual deception typically occurs in:
- prompts
- code snippets
- configuration lines
- instructions
Detection is deterministic and does not depend on rendering engines.
Mental model
Trojan Source detection in PromptShield is similar to:
- Unicode safety linting
- static source integrity checks
- editor security warnings
It protects against visual deception, not content correctness.