Docling HTML Backend, Multiple Resource Validation Vulnerabilities (Path Traversal & Server-Side Request Forgery) – CVE-2026-47214 (High) -DC-Jun2026-187

Listen to this Post

Docling’s HTML backend fails to properly validate resources when handling untrusted HTML documents. The core issue stems from accepting `file://` URIs while the `enable_local_fetch` option is set to True. This allows an attacker to read arbitrary files from the server’s local file system. Furthermore, the path resolution logic does not sufficiently restrict directory traversal; an attacker can embed `../` sequences or use absolute paths to escape the intended `base_path` and access any file on the system.
The backend also lacks network restrictions when enable_remote_fetch=True. It does not block requests to internal IP addresses, enabling Server-Side Request Forgery (SSRF) attacks. An attacker could probe internal services, access cloud metadata endpoints, or port-scan the local network. Additionally, HTTP redirects are followed without proper scheme or origin validation, which can redirect to `file://` or other dangerous schemes.
Remote image downloads and `data:` URIs have no size limits, allowing memory exhaustion attacks via extremely large images or base64-encoded payloads.
Patches were released in versions 2.91.0 (initial fixes) and 2.94.0 (additional improvements). The fixes enforce strict local path handling: absolute paths are always blocked, relative paths require `enable_local_fetch=True` and must remain within base_path. The `file://` scheme is stripped and treated as a local path. IP address validation prevents SSRF, and HTTP redirects are now validated with connection and read timeouts. Size limits are imposed on both remote images and base64-decoded `data:` URIs. As a workaround, keep both `enable_local_fetch=False` and `enable_remote_fetch=False` (the defaults) when processing untrusted HTML.

DailyCVE Form

Platform: Docling HTML Backend
Version: < 2.91.0
Vulnerability: Path traversal, SSRF
Severity: High
Date: 2026-06-02

Prediction: Already patched (2.91.0)

What Undercode Say

Check your Docling version:

pip show docling | grep Version

Test for the vulnerability with a local file read attempt:

from docling.document_converter import DocumentConverter
config = HtmlDocumentOptions(enable_local_fetch=True)
converter = DocumentConverter()
result = converter.convert("file:///etc/passwd", options=config)
print(result.text)

Check for SSRF by attempting to fetch an internal resource:

config = HtmlDocumentOptions(enable_remote_fetch=True)
result = converter.convert("http://169.254.169.254/latest/meta-data/", options=config)
print(result.text)

Exploit

  1. Local file read: Craft an HTML document containing `` and process it with enable_local_fetch=True. The backend will read and embed the contents of the target file.
  2. Path traversal: Use an image source like `”../../../sensitive/data.txt”` with enable_local_fetch=True. If `base_path` is not strictly enforced, the backend will traverse outside allowed directories.
  3. SSRF: Submit an HTML document with `` while enable_remote_fetch=True. The backend will fetch the internal resource, potentially leaking sensitive data or interacting with internal APIs.
  4. Redirect abuse: Supply a resource that returns an HTTP redirect to file:///etc/passwd. The backend follows the redirect without validation, leading to local file disclosure.
  5. Memory exhaustion: Provide a `data:` URI with a massive base64-encoded payload (e.g., a 100MB image) or a remote image with Content-Length: 2GB. Without size limits, the backend allocates memory until the process crashes.

Protection

  • Upgrade: Update to Docling version 2.91.0 or later (2.94.0 recommended) to receive all fixes.
  • Disable unsafe features: When processing untrusted HTML, ensure `enable_local_fetch=False` and `enable_remote_fetch=False` (these are the default settings).
  • Restrict network access: If remote fetching is required, deploy a network proxy or firewall to block requests to internal IP addresses.
  • Set resource limits: If size limits are not enforced by the library, wrap the conversion process with external memory and timeout limits.
  • Validate user input: Never allow user-controlled HTML to be processed with `enable_local_fetch` or `enable_remote_fetch` enabled.

Impact

  • Confidentiality: An attacker can read any file accessible to the Docling process, including configuration files, credentials, and source code.
  • Integrity: SSRF can be used to modify internal state, trigger unintended actions, or exploit internal APIs with write capabilities.
  • Availability: Memory exhaustion via oversized images or `data:` URIs can cause denial of service.
  • Network exposure: Internal network scanning and cloud metadata theft become possible, potentially leading to privilege escalation in cloud environments.

🎯Let’s Practice Exploiting & Learn Patching For Free:

🎓 Live Courses & Certifications:

Join Undercode Academy for Verified Certifications

🚀 Request a Custom Project:

Secure, high-velocity infrastructure and disruptive technological engineering. Contact our engineering team for high-tier development and proprietary systems:
[email protected]
💎 Smart Architecture | 🛡️ Secure by Design | ⭐ Trusted by Thousands

Sources:

Reported By: github.com
Extra Source Hub:
Undercode

🔐JOIN OUR CYBER WORLD [ CVE News • HackMonitor • UndercodeNews ]

💬 Whatsapp | 💬 Telegram

📢 Follow DailyCVE & Stay Tuned:

𝕏 formerly Twitter 🐦 | @ Threads | 🔗 Linkedin Featured Image

Scroll to Top