IA Squad
SearchEN
python · vllmHeads-up

vLLM GGUF Dequantize Integer Truncation Information Disclosure

Integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels causes partial tensor processing, leading to information disclosure via uninitialized

18 Jun 2026Read 1 minSeverity: schedule it

What changed

Integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels causes partial tensor processing, leading to information disclosure via uninitialized GPU memory.

Who it affects

Users of vLLM in multi-tenant inference deployments who load GGUF models with tensor dimensions whose product exceeds INT_MAX.

What to do today

Apply the fix from PR #44971 to change the `k` parameter type from `int` to `int64_t` in `to_cuda_ggml_t` and all dequantize functions.

The trail
Collected Audited Written Published