Back to calculator

Model notes

Mistral Nemo 12B

Long-context dense Mistral checkpoint that remains practical on a single 24 GB card with quantization.

12.2B dense • 128,000 context • 8 KV heads

Architecture

Model spec

Architecture

Dense decoder-only transformer

Total params

12.2B

Active params

Dense model

Layers

40

Hidden size

5,120

Attention heads

32

KV heads

8

KV-bearing layers

40

Context length

128,000

Modality

Text

License

Apache 2.0

Why it matters

Why memory behaves this way

Research highlight

Long-context dense Mistral design tuned for efficient single-node inference.

Memory note

Dense weights set the baseline footprint; long-context use makes KV cache the next thing to watch after quantization.

Checkpoints

Official profiles

Official BF16 checkpoint

BF16 checkpoint

Current

Mistral's official consolidated BF16 weights for Mistral Nemo are about 24.5 GB.

vLLMTransformers
Open checkpoint

Official FP8 checkpoint

FP8 checkpoint

Mistral's official FP8 checkpoint repository for Mistral Nemo is about 13.6 GB on Hugging Face.

vLLMTransformers
Open checkpoint

Sources

Reference links