Model notes

Llama 3.1 8B

Compact dense Llama model with grouped-query attention and a 128K context window.

8B dense • 131,072 context • 8 KV heads

Architecture

Model spec

Architecture

Dense decoder-only transformer

Total params

Active params

Dense model

Layers

Hidden size

4,096

Attention heads

KV heads

KV-bearing layers

Context length

131,072

Modality

Text

License

Llama 3.1 Community License

Why it matters

Research highlight

Grouped-query attention keeps KV state lighter than full multi-head attention while retaining a long native context window.

Memory note

Dense weights dominate the footprint; grouped KV heads help prevent cache growth from exploding at long context.

Checkpoints

BF16 checkpoint

Current

Meta's official Llama 3.1 8B Instruct release is a BF16 checkpoint with grouped-query attention.

vLLMTransformers

Sources