Model notes
Qwen 3.5 2B
Small hybrid Qwen3.5 release for developers who want longer context and native multimodal training heritage without a large single-card footprint.
2B dense • 262,144 context • 2 KV heads
Architecture
Model spec
Architecture
Total params
Active params
Layers
Hidden size
Attention heads
KV heads
KV-bearing layers
Context length
Modality
License
Why it matters
Why memory behaves this way
Research highlight
The 2B variant keeps the same 6 attention-bearing layers as the 0.8B model, which materially reduces KV growth compared with a full-attention stack.
Memory note
Resident weights still include the multimodal components, but the hybrid stack keeps text-generation cache growth noticeably lower than a dense full-attention design.
Checkpoints
Official profiles
Official BF16 checkpoint
BF16 checkpoint
Qwen documents Qwen3.5-2B in Hugging Face Transformers format with official Transformers and vLLM serving guidance.
Sources