python · vllmHeads-up
vLLM GGUF Dequantize Integer Truncation Information Disclosure
Integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels causes partial tensor processing, leading to information disclosure via uninitialized
What changed
Integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels causes partial tensor processing, leading to information disclosure via uninitialized GPU memory.
Who it affects
Users of vLLM in multi-tenant inference deployments who load GGUF models with tensor dimensions whose product exceeds INT_MAX.
What to do today
Apply the fix from PR #44971 to change the `k` parameter type from `int` to `int64_t` in `to_cuda_ggml_t` and all dequantize functions.
The trail
Collected→
Audited→
Written→
Published
Source
GitHub Advisory · vllm