nltk: Path traversal in nltk.data.load() via URL-encoded separators
NLTK's nltk.data.load() is vulnerable to path traversal via URL-encoded path separators and traversal segments when using the nltk: URL scheme. The regex check
What changed
NLTK's nltk.data.load() is vulnerable to path traversal via URL-encoded path separators and traversal segments when using the nltk: URL scheme. The regex check for unsafe paths is performed before url2pathname() decodes percent-encoded sequences, allowing bypass of the protection.
Who it affects
All users of NLTK <= 3.9.4 who pass untrusted input to nltk.data.load(), especially web applications, hosted notebook services, multi-tenant ML pipelines, and CI/CD systems.
What to do today
Update NLTK to a patched version or apply a workaround such as setting ENFORCE=True or sanitizing input before passing to nltk.data.load().