PromptShield logo PromptShield
PromptShield Detectors

Unicode Normalization Detector

Detects text that changes under Unicode normalization (NFKC).

The normalization detector identifies text that changes when converted to Unicode NFKC (Normalization Form Compatibility Composition).

This helps surface content where what you see is not exactly what the system interprets.

Why this matters

Unicode allows multiple representations for visually similar characters.

Normalization can silently transform text such as:

  • compatibility characters
  • presentation forms
  • mathematical alphabets
  • full-width variants

This creates risk in:

  • prompts
  • system instructions
  • authentication text
  • code
  • policy validation logic

If normalized text differs from the original, the content may behave differently than expected.

Detection rule

PSN001

Normalization-sensitive text

Flags spans where characters change under NFKC normalization.

Severity: HIGH

Example

Input


ℌello admin

Normalized


Hello admin

Result

The character (black-letter capital H) normalizes to H.

PromptShield reports the span as normalization-sensitive.

Detection model

The detector:

  1. Normalizes text using NFKC
  2. Compares each character to its normalized form
  3. Groups adjacent differences into spans
  4. Emits one threat per span

Span semantics:


offendingText = original span
decodedPayload = normalized span

What this detector does NOT do

It does not:

  • block multilingual content
  • enforce normalization automatically
  • attempt semantic interpretation

It only surfaces differences that may affect interpretation.

When to care

Normalization differences are most important in:

  • system prompts
  • tool instructions
  • code generation inputs
  • security-sensitive text
  • identity strings
  • configuration files

Less important in:

  • natural multilingual prose
  • UI copy
  • documentation text

Remediation

Recommended fix:

  • Replace characters with their normalized equivalents
  • Avoid compatibility Unicode forms in prompts or code

References

Unicode Standard Annex #15 — Unicode Normalization Forms https://unicode.org/reports/tr15/

PromptShield rule reference https://promptshield.js.org/docs/detectors/normalization#PSN001

On this page