Back to calculator

Model notes

Qwen 3.5 9B

Largest practical Qwen3.5 release for this batch, pairing a 9B language model with a resident multimodal stack that still targets single-GPU text serving.

10B dense • 262,144 context • 4 KV heads

Architecture

Model spec

Architecture

Hybrid multimodal transformer

Total params

10B

Active params

Dense model

Layers

32

Hidden size

4,096

Attention heads

16

KV heads

4

KV-bearing layers

8

Context length

262,144

Modality

Multimodal, text-only estimate

License

Apache 2.0

Why it matters

Why memory behaves this way

Research highlight

The hybrid layout keeps only 8 of 32 layers in the gated-attention path, which materially changes KV-cache behavior versus a dense long-context model.

Memory note

This estimate intentionally keeps the full multimodal checkpoint resident even for text-only use, so it is conservative relative to runtime-specific language-only shortcuts.

Checkpoints

Official profiles

Official BF16 checkpoint

BF16 checkpoint

Current

Qwen documents Qwen3.5-9B for Transformers and vLLM.

vLLMTransformers
Open checkpoint

Sources

Reference links