Back to calculator

Model notes

Qwen 2.5 7B

Small-to-mid-sized Qwen model with long context support and efficient grouped KV heads.

7.6B dense • 131,072 context • 4 KV heads

Architecture

Model spec

Architecture

Dense decoder-only transformer

Total params

7.6B

Active params

Dense model

Layers

28

Hidden size

3,584

Attention heads

28

KV heads

4

KV-bearing layers

28

Context length

131,072

Modality

Text

License

Apache 2.0

Why it matters

Why memory behaves this way

Research highlight

Long-context Qwen architecture with grouped KV heads to keep inference memory manageable.

Memory note

This is still a dense model, so resident weights set the floor; the compact KV layout mainly helps as context grows.

Checkpoints

Official profiles

Official BF16 checkpoint

BF16 checkpoint

Current

The official Qwen2.5-7B-Instruct checkpoint repository is about 15.2 GB on Hugging Face.

vLLMTransformers
Open checkpoint

Official GPTQ 4-bit checkpoint

4-bit checkpoint

The official Qwen2.5-7B-Instruct-GPTQ-Int4 checkpoint repository is about 5.59 GB on Hugging Face.

vLLMTransformers
Open checkpoint

Official AWQ 4-bit checkpoint

4-bit checkpoint

The official Qwen2.5-7B-Instruct-AWQ checkpoint repository is about 5.58 GB on Hugging Face.

vLLMTransformers
Open checkpoint

Sources

Reference links