Back to calculator

Model notes

Qwen 3.5 2B

Small hybrid Qwen3.5 release for developers who want longer context and native multimodal training heritage without a large single-card footprint.

2B dense • 262,144 context • 2 KV heads

Architecture

Model spec

Architecture

Hybrid multimodal transformer

Total params

2B

Active params

Dense model

Layers

24

Hidden size

2,048

Attention heads

8

KV heads

2

KV-bearing layers

6

Context length

262,144

Modality

Multimodal, text-only estimate

License

Apache 2.0

Why it matters

Why memory behaves this way

Research highlight

The 2B variant keeps the same 6 attention-bearing layers as the 0.8B model, which materially reduces KV growth compared with a full-attention stack.

Memory note

Resident weights still include the multimodal components, but the hybrid stack keeps text-generation cache growth noticeably lower than a dense full-attention design.

Checkpoints

Official profiles

Official BF16 checkpoint

BF16 checkpoint

Current

Qwen documents Qwen3.5-2B in Hugging Face Transformers format with official Transformers and vLLM serving guidance.

vLLMTransformers
Open checkpoint

Sources

Reference links