IA Squad
SearchPT
python · nltkCritical

nltk: Path traversal in nltk.data.load() via URL-encoded separators

NLTK's nltk.data.load() is vulnerable to path traversal via URL-encoded path separators and traversal segments when using the nltk: URL scheme. The regex check

17 Jun 2026Read 1 minSeverity: act now

What changed

NLTK's nltk.data.load() is vulnerable to path traversal via URL-encoded path separators and traversal segments when using the nltk: URL scheme. The regex check for unsafe paths is performed before url2pathname() decodes percent-encoded sequences, allowing bypass of the protection.

Who it affects

All users of NLTK <= 3.9.4 who pass untrusted input to nltk.data.load(), especially web applications, hosted notebook services, multi-tenant ML pipelines, and CI/CD systems.

What to do today

Update NLTK to a patched version or apply a workaround such as setting ENFORCE=True or sanitizing input before passing to nltk.data.load().

The trail
Collected Audited Written Published