Qwen
Qwen 3 30B A3B Instruct 2507
Non-thinking Qwen3 MoE update with stronger general capabilities, better alignment, and native 256K context packaging.
Overview and architecture
What it is
Company
Family
Release date
Architecture
License
Modality
Context window
Total params
Active params
Layers
Hidden size
Attention heads
KV heads
KV-bearing layers
Research highlight
What improved
Non-thinking update
This release is explicitly the non-thinking-mode update and no longer requires users to force thinking off at inference time.
General-capability uplift
Qwen describes stronger instruction following, logical reasoning, comprehension, mathematics, science, coding, and tool use than the earlier non-thinking version.
256K native context
The update is packaged with 256K native context, making long-context serving more central than in the base 30B-A3B release.
Training and release context
How it was released
Release lineage
This is an updated non-thinking-mode variant of Qwen3-30B-A3B rather than a brand-new architecture branch.
MoE geometry
The model keeps the same 30.5B total / 3.3B active parameter geometry, 48 layers, 128 experts, and 8 activated experts as the base A3B release.
Output behavior
Qwen notes that this update no longer emits <think></think> blocks and is intended as a cleaner non-thinking deployment target.
Where it is strong
Where it is strong
General assistant quality
Best fit when you want the Qwen3 MoE branch without exposing explicit thinking-mode behavior in outputs.
Tool and workflow use
Qwen emphasizes stronger tool usage, instruction following, and text generation alignment in this update.
Long-context non-thinking serving
The 256K native context makes it useful for long-input assistant workflows where explicit reasoning blocks are not desired.
Memory behavior
What dominates VRAM
Resident VRAM still tracks the full 30.5B MoE checkpoint, but the 256K native context means cache growth becomes much more visible during long-context serving than in the base 32K-native release.
Sources