python · vllmHeads-up

vLLM GGUF Dequantize Integer Truncation Information Disclosure

Integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels causes partial tensor processing, leading to information disclosure via uninitialized

18 Jun 2026Read 1 minSeverity: schedule it

What changed

Integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels causes partial tensor processing, leading to information disclosure via uninitialized GPU memory.

Who it affects

Users of vLLM in multi-tenant inference deployments who load GGUF models with tensor dimensions whose product exceeds INT_MAX.

What to do today

Apply the fix from PR #44971 to change the `k` parameter type from `int` to `int64_t` in `to_cuda_ggml_t` and all dequantize functions.

The trail

Collected→ Audited→ Written→ Published

Source

GitHub Advisory · vllm

vLLM GGUF Dequantize Integer Truncation Information Disclosure

What changed

Who it affects

What to do today

More on vllm

vllm: Temperature validation silently passes NaN and +Inf values

vllm: Fix three image processing issues (EXIF orientation, PNG tRNS, APNG/GIF frames)