FitMyGPU
Back to calculator

NVIDIA

OpenReasoning Nemotron 14B

Mid-sized dense Nemotron checkpoint for users who want stronger reasoning behavior than 7B without stepping straight into 32B deployment territory.

Overview and architecture

What it is

Company

NVIDIA

Family

Nemotron

Release date

Jul 15, 2025

Architecture

Dense decoder-only transformer

License

CC-BY-4.0 + Apache 2.0

Modality

Text

Context window

131,072

Total params

14.7B

Active params

Dense model

Layers

48

Hidden size

5,120

Attention heads

40

KV heads

8

KV-bearing layers

48

Research highlight

What improved

Reasoning-first post-training

NVIDIA positions the 14B Nemotron model around stronger math, code, and science reasoning rather than around a new base architecture.

Qwen2.5-derived backbone

The family stays close to a Qwen2.5 dense grouped-query backbone, so the main change is in post-training behavior and benchmark profile, not in memory geometry.

GenSelect heavy mode

The model card explicitly introduces a heavier multi-sample inference path through GenSelect, which matters because capability can scale at inference time without changing the resident model itself.

Benchmark-led release framing

NVIDIA markets the line primarily through reasoning benchmark results in its size class, so this is a capability-tuned release more than an architecture-tuned one.

Training and release context

How it was released

Base-model inheritance

OpenReasoning-Nemotron models are NVIDIA post-training releases built directly on top of Qwen2.5 dense backbones.

Release method

The family is released as a reasoning-tuned derivative line rather than as a new architecture family with different serving mechanics.

Optional heavy mode

NVIDIA pairs the base checkpoints with GenSelect-style multi-sample inference guidance, so part of the release story lives in inference strategy rather than in the resident model alone.

Where it is strong

Where it is strong

Math and science reasoning

NVIDIA positions the family around benchmark-heavy reasoning workloads.

Code generation

The release emphasizes code and solution-generation performance alongside math.

Test-time scaling

GenSelect gives the family a clear path to higher-quality heavy inference when latency is less constrained.

Memory behavior

What dominates VRAM

This is still a dense 14B-class checkpoint: weights dominate the fit decision, and context length becomes the next major lever after quantization.

Sources

Where this page is grounded