Qwen
Qwen 3 235B A22B
Largest Qwen3 MoE release with 235B total parameters and 22B activated parameters, aimed at frontier-scale open reasoning and agent use.
Overview and architecture
What it is
Company
Family
Release date
Architecture
License
Modality
Context window
Total params
Active params
Layers
Hidden size
Attention heads
KV heads
KV-bearing layers
Research highlight
What improved
Flagship Qwen3 MoE
This is the largest-capacity Qwen3 release in the current line, intended as the top open model in the family.
Sparse activation at scale
The model keeps 235B total parameters but only 22B activated per token, which is the core reason to deploy it as an MoE rather than a dense frontier-scale model.
Reasoning and agent focus
Qwen still frames the flagship around reasoning, instruction following, and agent-style workflows rather than only benchmark scale.
Training and release context
How it was released
MoE family branch
Qwen3 includes dedicated MoE models alongside the dense line, keeping the same user-facing thinking/non-thinking framing while changing the serving geometry materially.
Sparse activation
The MoE releases expose total and activated parameter counts separately, which is the key deployment distinction versus the dense Qwen3 models.
Long-context packaging
The base MoE releases are published with 32K native context and 131K support with YaRN, while the 2507 update is packaged at 256K native context.
Where it is strong
Where it is strong
Reasoning with lower active compute
The MoE line is for users who want larger total capacity without paying dense-model active compute per token.
Agent and tool use
Qwen still positions the MoE branch around agent workflows, tool calling, and mixed reasoning/general dialogue use.
Large multilingual serving
Useful when you want very large-capacity multilingual serving without moving to a purely dense 70B+ model.
Memory behavior
What dominates VRAM
Even with only 22B activated per token, the full 235B resident expert pool dominates VRAM immediately, so this is primarily a multi-card or very-large-card deployment target.
Sources