Back to calculator

Model notes

OpenReasoning Nemotron 32B

Largest Nemotron checkpoint in this batch, intended as a serious reasoning model that still follows a plain dense Qwen2.5-style memory profile.

32.5B dense • 131,072 context • 8 KV heads

Architecture

Model spec

Architecture

Dense decoder-only transformer

Total params

32.5B

Active params

Dense model

Layers

64

Hidden size

5,120

Attention heads

40

KV heads

8

KV-bearing layers

64

Context length

131,072

Modality

Text

License

CC-BY-4.0 + Apache 2.0

Why it matters

Why memory behaves this way

Research highlight

The 32B reasoning model keeps the dense grouped-query Qwen2.5 32B backbone, which makes the VRAM story much easier to reason about than a sparse frontier model.

Memory note

Dense resident weights dominate immediately, so single-GPU deployment quickly becomes a quantization-and-runtime-budget problem rather than a cache problem.

Checkpoints

Official profiles

Official BF16 checkpoint

BF16 checkpoint

Current

NVIDIA publishes OpenReasoning-Nemotron-32B as a dense Qwen2.5-32B derivative in Hugging Face Transformers format, and v1 models it with the same grouped-query cache geometry.

vLLMTransformers
Open checkpoint

Sources

Reference links