Company
Qwen
Qwen dense, hybrid, and multimodal families with frequent open checkpoint releases.
Start here
Latest model
Series
Qwen 3.6
Qwen 3.6 35B A3B
Qwen3.6 MoE release tuned for real-world coding agents, with a 35B multimodal checkpoint and a smaller 3B active path for lower token-time compute.
35B total • 3B active • 262,144 context • 2 KV heads
Qwen 3.6 27B
Qwen3.6 dense hybrid release focused on coding-agent stability, repository reasoning, and preserved thinking context across longer development sessions.
27B dense • 262,144 context • 4 KV heads
Series
Qwen 3.5
Qwen 3.5 397B A17B
Largest Qwen3.5 release in the registry, combining the hybrid multimodal stack with a very large MoE parameter pool and a much smaller active token path.
397B total • 17B active • 262,144 context • 2 KV heads
Qwen 3.5 122B A10B
High-capacity Qwen3.5 MoE release for users who want the family’s hybrid multimodal architecture at a much larger scale without paying dense 122B compute per token.
122B total • 10B active • 262,144 context • 2 KV heads
Qwen 3.5 35B A3B
Hybrid multimodal Qwen3.5 MoE checkpoint with a 35B total parameter pool and a much smaller active path for lower compute than a dense model of similar capacity.
35B total • 3B active • 262,144 context • 2 KV heads
Qwen 3.5 27B
Large dense Qwen3.5 release that keeps the hybrid multimodal stack but pushes into a much heavier single-model serving class than the 9B tier.
27B dense • 262,144 context • 4 KV heads
Qwen 3.5 9B
Largest practical Qwen3.5 release for this batch, pairing a 9B language model with a resident multimodal stack that still targets single-GPU text serving.
10B dense • 262,144 context • 4 KV heads
Qwen 3.5 4B
Mid-sized Qwen3.5 checkpoint with a larger resident multimodal footprint but still practical for careful single-GPU text-only serving.
5B dense • 262,144 context • 4 KV heads
Qwen 3.5 2B
Small hybrid Qwen3.5 release for developers who want longer context and native multimodal training heritage without a large single-card footprint.
2B dense • 262,144 context • 2 KV heads
Qwen 3.5 0.8B
Compact Qwen3.5 checkpoint with a hybrid text-plus-vision stack and a small resident footprint for text-only local experimentation.
900M dense • 262,144 context • 2 KV heads
Series
Qwen 3
Qwen 3 235B A22B
Largest Qwen3 MoE release with 235B total parameters and 22B activated parameters, aimed at frontier-scale open reasoning and agent use.
235B total • 22B active • 131,072 context • 4 KV heads
Qwen 3 32B
Largest dense Qwen3 release for high-capacity reasoning, agent, and multilingual assistant workloads with switchable thinking modes.
32.8B dense • 131,072 context • 8 KV heads
Qwen 3 30B A3B
Qwen3 MoE release with 30.5B total parameters and 3.3B active parameters, built for lower active compute than a comparable dense model.
30.5B total • 3.3B active • 131,072 context • 4 KV heads
Qwen 3 30B A3B Instruct 2507
Non-thinking Qwen3 MoE update with stronger general capabilities, better alignment, and native 256K context packaging.
30.5B total • 3.3B active • 262,144 context • 4 KV heads
Qwen 3 14B
Dense Qwen3 release for higher-capacity reasoning, agent, and multilingual assistant workloads with switchable thinking modes.
14.8B dense • 131,072 context • 8 KV heads
Qwen 3 8B
Dense Qwen3 release for stronger general-purpose reasoning, agent, and multilingual assistant use with switchable thinking modes.
8.2B dense • 131,072 context • 8 KV heads
Qwen 3 4B
Dense Qwen3 release with switchable thinking modes, stronger reasoning, and 131K extended-context support through YaRN.
4B dense • 131,072 context • 8 KV heads
Qwen 3 4B Thinking 2507
Qwen3 update focused on deeper reasoning and longer native context, tuned specifically for more complex thinking-heavy workloads.
4B dense • 262,144 context • 8 KV heads
Qwen 3 1.7B
Small dense Qwen3 release for lightweight reasoning, agent, and multilingual assistant use with switchable thinking modes.
1.7B dense • 32,768 context • 8 KV heads
Qwen 3 0.6B
Smallest dense Qwen3 release with switchable thinking and non-thinking modes in a very light deployment footprint.
600M dense • 32,768 context • 8 KV heads
Series
Qwen 2.5
Qwen 2.5 72B
Instruction-tuned 72B Qwen2.5 model for the highest-capacity dense Qwen2.5 long-context, coding, math, and structured-output workloads.
72.7B dense • 131,072 context • 8 KV heads
Qwen 2.5 32B
Instruction-tuned 32B Qwen2.5 model for higher-capacity long-context, coding, math, and structured-output workloads in a large dense deployment shape.
32.5B dense • 131,072 context • 8 KV heads
Qwen 2.5 14B
Instruction-tuned 14B Qwen2.5 model for long-context, coding, math, and structured-output workloads with a larger dense capacity than the 7B release.
14.7B dense • 131,072 context • 8 KV heads
Qwen 2.5 7B
Instruction-tuned 7B Qwen2.5 model for long-context, coding, math, and structured-output workloads in a straightforward dense deployment shape.
7.6B dense • 131,072 context • 4 KV heads
Qwen 2.5 3B
Instruction-tuned 3B Qwen2.5 model for stronger small-model coding, math, structured-output, and assistant use in a compact dense footprint.
3.1B dense • 32,768 context • 2 KV heads
Qwen 2.5 1.5B
Instruction-tuned 1.5B Qwen2.5 model for lightweight coding, math, structured-output, and assistant tasks in a small dense deployment footprint.
1.5B dense • 32,768 context • 2 KV heads
Qwen 2.5 0.5B
Instruction-tuned 0.5B Qwen2.5 model for lightweight assistant, structured-output, and long-prompt use in very small dense deployments.
490M dense • 32,768 context • 2 KV heads