Qwen
Qwen 3 4B Thinking 2507
Qwen3 update focused on deeper reasoning and longer native context, tuned specifically for more complex thinking-heavy workloads.
Overview and architecture
What it is
Company
Family
Release date
Architecture
License
Modality
Context window
Total params
Active params
Layers
Hidden size
Attention heads
KV heads
KV-bearing layers
Research highlight
What improved
Reasoning-focused update
Qwen describes this release as a scaled-up thinking-capability update rather than a general-purpose refresh.
256K native context
The 2507 update moves the model to 256K native context, which is one of the clearest deployment changes from the base 4B release.
Deeper thinking length
Qwen explicitly notes a longer thinking length and recommends the model for highly complex reasoning tasks.
Training and release context
How it was released
Release lineage
This is an updated reasoning-oriented version of Qwen3-4B rather than a separate new architecture family.
Model geometry
The update keeps the same 4.0B parameter, 36-layer, 32Q/8KV dense geometry as the base 4B model.
Context packaging
Unlike the base model’s 32K native context with YaRN extension, this update is packaged with 256K native context.
Where it is strong
Where it is strong
Complex reasoning
Best fit for logic, mathematics, science, coding, and other tasks where longer reasoning traces help.
Long-context understanding
The 256K native context makes it more useful for very long inputs than the base Qwen3 dense line.
Tool and instruction use
Qwen also positions the update as stronger on instruction following and tool usage, not only on benchmark reasoning.
Memory behavior
What dominates VRAM
This remains a dense 4B model, but the 256K native context means KV growth can become a much larger part of the total than on the base 32K-native release.
Sources