Back to calculator

Model notes

Gemma 2 9B

Instruction-tuned Gemma checkpoint with a relatively short native context window and efficient KV usage.

9.2B dense • 8,192 context • 8 KV heads

Architecture

Model spec

Architecture

Dense decoder-only transformer

Total params

9.2B

Active params

Dense model

Layers

42

Hidden size

3,584

Attention heads

16

KV heads

8

KV-bearing layers

42

Context length

8,192

Modality

Text

License

Gemma terms

Why it matters

Why memory behaves this way

Research highlight

Gemma 2 focuses on efficient dense inference rather than extreme context length.

Memory note

The shorter native context window keeps KV cache moderate, so the main memory driver is still the dense weight tensor.

Checkpoints

Official profiles

Official BF16 checkpoint

BF16 checkpoint

Current

Google's official Gemma 2 9B Instruct release is exported in bfloat16.

vLLMTransformers
Open checkpoint

Sources

Reference links