Back to calculator

Model notes

Qwen 2.5 14B

Mid-sized Qwen model with strong long-context behavior and a practical fit for 24 to 80 GB cards.

14.7B dense • 131,072 context • 8 KV heads

Architecture

Model spec

Architecture

Dense decoder-only transformer

Total params

14.7B

Active params

Dense model

Layers

48

Hidden size

5,120

Attention heads

40

KV heads

8

KV-bearing layers

48

Context length

131,072

Modality

Text

License

Apache 2.0

Why it matters

Why memory behaves this way

Research highlight

Scaled Qwen long-context stack with grouped attention and strong dense-model generality.

Memory note

The jump from 7B to 14B is mostly resident weight memory; KV cache remains relatively controlled thanks to grouped KV heads.

Checkpoints

Official profiles

Official BF16 checkpoint

BF16 checkpoint

Current

The official Qwen2.5-14B-Instruct checkpoint repository is about 29.6 GB on Hugging Face.

vLLMTransformers
Open checkpoint

Official GPTQ 4-bit checkpoint

4-bit checkpoint

The official Qwen2.5-14B-Instruct-GPTQ-Int4 checkpoint repository is about 10 GB on Hugging Face.

vLLMTransformers
Open checkpoint

Official AWQ 4-bit checkpoint

4-bit checkpoint

The official Qwen2.5-14B-Instruct-AWQ checkpoint repository is about 10 GB on Hugging Face.

vLLMTransformers
Open checkpoint

Sources

Reference links