Bleach (Python), URI Scheme Restriction Bypass, CVE-2026-XXXXX (Medium) -DC-Jun2026-470

How CVE-2026-XXXXX Works

This vulnerability resides in Bleach’s URI scheme validation logic when sanitizing HTML attributes. Bleach is a popular Python library that sanitizes user-generated HTML by stripping disallowed tags and attributes based on an allowlist. When a caller configures `bleach.clean()` with `` in the allowed tags and `href` in the allowed attributes, the library is expected to strip any `href` values that use disallowed URI schemes (e.g., javascript:, data:) while allowing safe schemes like `http:` and https:.
The flaw arises because Bleach’s scheme validation does not properly account for Unicode characters, specifically the Zero Width Space (ZWSP) character \u200b. When an attacker inserts `\u200b` between `javascript` and the colon—resulting in javascript\u200b:—the sanitizer’s scheme parser fails to recognize this as a `javascript:` URI. According to RFC 3986, the colon must appear immediately after the scheme name; the inserted ZWSP makes the scheme syntactically invalid. Consequently, Bleach permits the `href` to pass through unsanitized, violating the caller’s protocol allowlist.
Critically, modern browsers do not execute `javascript\u200b:` as a valid JavaScript URI because the scheme is malformed per RFC 3986. Therefore, this is not a direct XSS vulnerability in standard processing chains. However, a secondary risk emerges if a downstream system performs Unicode normalization on Bleach’s output—stripping invisible characters like ZWSP before rendering. In that non-standard scenario, the `javascript:` scheme would become valid, potentially leading to XSS.
The issue was reported by codeant from CodeAnt AI and affects Bleach versions prior to 6.4.0. The fix in version 6.4.0 properly strips URI schemes containing non-ASCII characters. Users are strongly advised to upgrade immediately or apply workarounds such as pre-processing content to remove non-ASCII characters from URI schemes before sanitization.

DailyCVE Form

| Field | Value |

|-|-|

| Platform | Python / Bleach |

| Version | < 6.4.0 |

| Vulnerability | URI Scheme Restriction Bypass |

| Severity | Medium |

| Date | 2026-06-16 |

| Prediction | Patch available (6.4.0) |

What Undercode Say: Analytics

Check installed Bleach version
pip show bleach | grep Version
Verify vulnerable version (before 6.4.0)
python -c "import bleach; print(bleach.<strong>version</strong>)"
Test the vulnerability
python3 << 'EOF'
import bleach
payload = '<a href="javascript\u200b:alert(document.cookie)">Click me</a>'
result = bleach.clean(payload, tags=['a'], attributes={'a': ['href']})
print(f"Sanitized output: {repr(result)}")
Output (vulnerable): '<a href="javascript\u200b:alert(document.cookie)">Click me</a>'
Output (patched): '<a href="">Click me</a>' or similar stripped result
EOF
Upgrade to patched version
pip install --upgrade bleach==6.4.0

Exploit

An attacker can craft malicious HTML with a `href` attribute containing a disallowed URI scheme obscured by a Zero Width Space character:

<a href="javascript\u200b:alert(document.cookie)">Click me</a>

When processed by `bleach.clean()` with `` and `href` allowed, the sanitizer fails to recognize `javascript\u200b:` as a disallowed scheme. The output retains the full `href` value, breaking the sanitizer’s contract. In environments where downstream Unicode normalization occurs, this could lead to XSS execution.

Protection

Upgrade to Bleach 6.4.0 – This is the official fix that properly strips URI schemes containing non-ASCII characters.

Pre-process Content – Remove non-ASCII characters from URI schemes before passing content to bleach.clean():

import re
def sanitize_uri_schemes(html):
Remove ZWSP and other invisible chars from javascript: patterns
return re.sub(r'javascript[\u200b-\u200f\u2028-\u2029]:', 'javascript:', html)

Content Security Policy (CSP) – Implement a strong CSP header that disallows `unsafe-inline` and `unsafe-eval` in `script-src` directives to mitigate any potential XSS impact.
Avoid Downstream Normalization – Do not perform Unicode normalization on sanitized HTML output before rendering, as this could re-activate otherwise-inert URI schemes.

Impact

Broken Sanitizer Contract – Bleach outputs URI values that violate the caller’s protocol allowlist, undermining the security guarantees expected by developers.

Theoretical XSS Risk – If a downstream system normalizes Unicode (stripping invisible characters) before rendering, the `javascript:` scheme becomes valid, enabling script execution.

Affected Configurations – Only users calling `bleach.clean()` with `` in allowed tags and `href` in allowed attributes are impacted.

No Direct XSS – Modern browsers do not execute `javascript\u200b:` URIs, so this is not a direct cross-site scripting vulnerability in standard deployments.

🎯Let’s Practice Exploiting & Learn Patching For Free:

🎓 Live Courses & Certifications:

Join Undercode Academy for Verified Certifications

🚀 Request a Custom Project:

Secure, high-velocity infrastructure and disruptive technological engineering. Contact our engineering team for high-tier development and proprietary systems:
[email protected]
💎 Smart Architecture | 🛡️ Secure by Design | ⭐ Trusted by Thousands

Sources:

Reported By: github.com
Extra Source Hub:
Undercode

🔐JOIN OUR CYBER WORLD [ CVE News • HackMonitor • UndercodeNews ]

💬 Whatsapp | 💬 Telegram

📢 Follow DailyCVE & Stay Tuned:

𝕏 formerly Twitter 🐦 | @ Threads | 🔗 Linkedin

Listen to this Post