PromptShield logo PromptShield
PromptShield Detectors

Homoglyph Detection

PromptShield detects mixed-script homoglyph attacks, where visually similar characters from different Unicode scripts are combined to create deceptive identifiers or instructions.

These attacks are commonly used in:

  • prompt injection
  • identity spoofing
  • configuration manipulation
  • phishing-style prompt attacks
  • code review bypass

Why this matters

Humans read glyphs visually.

Computers interpret Unicode code points.

Two characters that look identical can be completely different:


a (Latin)     U+0061
а (Cyrillic)  U+0430

This enables spoofing like:


admin
admіn   ← Cyrillic "і"

They look identical in most editors.

They are not the same string.

This breaks validation, allow-lists, and policy checks.

Detection model

The PromptShield homoglyph detector:

  • scans text for word spans
  • inspects Unicode script composition per word
  • detects suspicious Latin + Cyrillic or Latin + Greek mixing
  • emits one diagnostic per word

The detector intentionally avoids flagging multilingual text to reduce false positives.

Rule

PSH001

Mixed-script homoglyph word

Severity: CRITICAL

A word contains characters from multiple Unicode scripts that can be used for spoofing.

Example:


pаypal

The second character is Cyrillic:


p + Cyrillic "а" + ypal

Another example:


admіn

Where:


i → Cyrillic "і"

These words appear normal to humans but differ at the code-point level.

Suggested remediation

Replace homoglyph characters with characters from the intended script.

For identifiers, prompts, and configuration values:

  • use ASCII when possible
  • avoid mixed-script identifiers
  • normalize input before validation

Design notes

PromptShield intentionally detects mixed-script composition, not individual confusable characters.

This avoids false positives in:

  • multilingual documentation
  • international content
  • natural language text

Detection focuses on security-relevant misuse, not typography.

Mental model

Homoglyph detection protects against:

  • identifier spoofing
  • prompt impersonation
  • policy bypass using confusable characters

It is conceptually similar to:

  • IDN homograph protections in browsers
  • Unicode spoofing detection
  • authentication identifier validation

On this page