Model notes
OpenReasoning Nemotron 32B
Largest Nemotron checkpoint in this batch, intended as a serious reasoning model that still follows a plain dense Qwen2.5-style memory profile.
32.5B dense • 131,072 context • 8 KV heads
Architecture
Model spec
Architecture
Total params
Active params
Layers
Hidden size
Attention heads
KV heads
KV-bearing layers
Context length
Modality
License
Why it matters
Why memory behaves this way
Research highlight
The 32B reasoning model keeps the dense grouped-query Qwen2.5 32B backbone, which makes the VRAM story much easier to reason about than a sparse frontier model.
Memory note
Dense resident weights dominate immediately, so single-GPU deployment quickly becomes a quantization-and-runtime-budget problem rather than a cache problem.
Checkpoints
Official profiles
Official BF16 checkpoint
BF16 checkpoint
NVIDIA publishes OpenReasoning-Nemotron-32B as a dense Qwen2.5-32B derivative in Hugging Face Transformers format, and v1 models it with the same grouped-query cache geometry.
Sources