Listen to this Post
Docling’s HTML backend fails to properly validate resources when handling untrusted HTML documents. The core issue stems from accepting `file://` URIs while the `enable_local_fetch` option is set to True. This allows an attacker to read arbitrary files from the server’s local file system. Furthermore, the path resolution logic does not sufficiently restrict directory traversal; an attacker can embed `../` sequences or use absolute paths to escape the intended `base_path` and access any file on the system.
The backend also lacks network restrictions when enable_remote_fetch=True. It does not block requests to internal IP addresses, enabling Server-Side Request Forgery (SSRF) attacks. An attacker could probe internal services, access cloud metadata endpoints, or port-scan the local network. Additionally, HTTP redirects are followed without proper scheme or origin validation, which can redirect to `file://` or other dangerous schemes.
Remote image downloads and `data:` URIs have no size limits, allowing memory exhaustion attacks via extremely large images or base64-encoded payloads.
Patches were released in versions 2.91.0 (initial fixes) and 2.94.0 (additional improvements). The fixes enforce strict local path handling: absolute paths are always blocked, relative paths require `enable_local_fetch=True` and must remain within base_path. The `file://` scheme is stripped and treated as a local path. IP address validation prevents SSRF, and HTTP redirects are now validated with connection and read timeouts. Size limits are imposed on both remote images and base64-decoded `data:` URIs. As a workaround, keep both `enable_local_fetch=False` and `enable_remote_fetch=False` (the defaults) when processing untrusted HTML.
DailyCVE Form
Platform: Docling HTML Backend
Version: < 2.91.0
Vulnerability: Path traversal, SSRF
Severity: High
Date: 2026-06-02
Prediction: Already patched (2.91.0)
What Undercode Say
Check your Docling version:
pip show docling | grep Version
Test for the vulnerability with a local file read attempt:
from docling.document_converter import DocumentConverter
config = HtmlDocumentOptions(enable_local_fetch=True)
converter = DocumentConverter()
result = converter.convert("file:///etc/passwd", options=config)
print(result.text)
Check for SSRF by attempting to fetch an internal resource:
config = HtmlDocumentOptions(enable_remote_fetch=True)
result = converter.convert("http://169.254.169.254/latest/meta-data/", options=config)
print(result.text)
Exploit
- Local file read: Craft an HTML document containing `
` and process it with
enable_local_fetch=True. The backend will read and embed the contents of the target file. - Path traversal: Use an image source like `”../../../sensitive/data.txt”` with
enable_local_fetch=True. If `base_path` is not strictly enforced, the backend will traverse outside allowed directories. - SSRF: Submit an HTML document with `
` while
enable_remote_fetch=True. The backend will fetch the internal resource, potentially leaking sensitive data or interacting with internal APIs. - Redirect abuse: Supply a resource that returns an HTTP redirect to
file:///etc/passwd. The backend follows the redirect without validation, leading to local file disclosure. - Memory exhaustion: Provide a `data:` URI with a massive base64-encoded payload (e.g., a 100MB image) or a remote image with
Content-Length: 2GB. Without size limits, the backend allocates memory until the process crashes.
Protection
- Upgrade: Update to Docling version 2.91.0 or later (2.94.0 recommended) to receive all fixes.
- Disable unsafe features: When processing untrusted HTML, ensure `enable_local_fetch=False` and `enable_remote_fetch=False` (these are the default settings).
- Restrict network access: If remote fetching is required, deploy a network proxy or firewall to block requests to internal IP addresses.
- Set resource limits: If size limits are not enforced by the library, wrap the conversion process with external memory and timeout limits.
- Validate user input: Never allow user-controlled HTML to be processed with `enable_local_fetch` or `enable_remote_fetch` enabled.
Impact
- Confidentiality: An attacker can read any file accessible to the Docling process, including configuration files, credentials, and source code.
- Integrity: SSRF can be used to modify internal state, trigger unintended actions, or exploit internal APIs with write capabilities.
- Availability: Memory exhaustion via oversized images or `data:` URIs can cause denial of service.
- Network exposure: Internal network scanning and cloud metadata theft become possible, potentially leading to privilege escalation in cloud environments.
🎯Let’s Practice Exploiting & Learn Patching For Free:
🎓 Live Courses & Certifications:
Join Undercode Academy for Verified Certifications
🚀 Request a Custom Project:
Secure, high-velocity infrastructure and disruptive technological engineering. Contact our engineering team for high-tier development and proprietary systems:
[email protected]
💎 Smart Architecture | 🛡️ Secure by Design | ⭐ Trusted by Thousands
Sources:
Reported By: github.com
Extra Source Hub:
Undercode

