FitMyGPU
Back to calculator

NVIDIA

OpenReasoning Nemotron 7B

Reasoning-tuned dense Nemotron checkpoint that tracks the familiar Qwen2.5 7B memory shape while targeting stronger math and code performance.

Overview and architecture

What it is

Company

NVIDIA

Family

Nemotron

Release date

Jul 15, 2025

Architecture

Dense decoder-only transformer

License

CC-BY-4.0 + Apache 2.0

Modality

Text

Context window

131,072

Total params

7.6B

Active params

Dense model

Layers

28

Hidden size

3,584

Attention heads

28

KV heads

4

KV-bearing layers

28

Research highlight

What improved

Reasoning-first post-training

NVIDIA positions the 7B Nemotron model around stronger math, code, and science reasoning rather than around a new base architecture.

Qwen2.5-derived backbone

The family stays close to a Qwen2.5 dense grouped-query backbone, so the main change is in post-training behavior and benchmark profile, not in memory geometry.

GenSelect heavy mode

The model card explicitly introduces a heavier multi-sample inference path through GenSelect, which matters because capability can scale at inference time without changing the resident model itself.

Benchmark-led release framing

NVIDIA markets the line primarily through reasoning benchmark results in its size class, so this is a capability-tuned release more than an architecture-tuned one.

Training and release context

How it was released

Base-model inheritance

OpenReasoning-Nemotron models are NVIDIA post-training releases built directly on top of Qwen2.5 dense backbones.

Release method

The family is released as a reasoning-tuned derivative line rather than as a new architecture family with different serving mechanics.

Optional heavy mode

NVIDIA pairs the base checkpoints with GenSelect-style multi-sample inference guidance, so part of the release story lives in inference strategy rather than in the resident model alone.

Where it is strong

Where it is strong

Math and science reasoning

NVIDIA positions the family around benchmark-heavy reasoning workloads.

Code generation

The release emphasizes code and solution-generation performance alongside math.

Test-time scaling

GenSelect gives the family a clear path to higher-quality heavy inference when latency is less constrained.

Memory behavior

What dominates VRAM

Resident weights set the floor, and the grouped KV layout keeps long-context cache growth moderate relative to older full-head dense models.

Sources

Where this page is grounded