Gemma
Gemma 2 9B
Instruction-tuned Gemma checkpoint with a relatively short native context window and efficient KV usage.
Overview and architecture
What it is
Company
Family
Release date
Architecture
License
Modality
Context window
Total params
Active params
Layers
Hidden size
Attention heads
KV heads
KV-bearing layers
Research highlight
What improved
Compact open model line
Google positions Gemma 2 9B as part of a lightweight open family derived from Gemini-era research, aimed at getting strong dense-model quality from smaller deployment footprints.
Efficiency over frontier scale
The release emphasis is efficient open deployment and good quality-per-parameter, not sparse routing, multimodal fusion, or ultra-long-context serving.
Instruction-tuned product focus
The instruction variants are framed as practical developer models, so the story is real deployment usability rather than experimental architecture novelty.
Training and release context
How it was released
Open-model tier
Gemma 2 sits in Google's smaller open model line rather than in the flagship Gemini product tier.
Architecture continuity
The family stays within a straightforward dense-transformer deployment pattern and does not depend on sparse or hybrid serving mechanics.
Release packaging
The instruction-tuned variants are packaged as practical deployment checkpoints rather than as research-preview artifacts.
Where it is strong
Where it is strong
Efficiency
Strong quality-per-parameter for teams that want a smaller dense model footprint.
General language tasks
Useful as a compact open baseline for assistant-style and retrieval-augmented applications.
Operational simplicity
Straightforward dense checkpoints make the family easier to reason about than more exotic architectures.
Memory behavior
What dominates VRAM
The shorter native context window keeps KV cache moderate, so the main memory driver is still the dense weight tensor.
Sources