FitMyGPU
Back to calculator

Qwen

Qwen 3.6 27B

Qwen3.6 dense hybrid release focused on coding-agent stability, repository reasoning, and preserved thinking context across longer development sessions.

Overview and architecture

What it is

Company

Qwen

Family

Qwen

Release date

Apr 21, 2026

Architecture

Hybrid multimodal transformer

License

Apache 2.0

Modality

Multimodal (text-only estimate)

Context window

262,144

Total params

27B

Active params

Dense model

Layers

64

Hidden size

5,120

Attention heads

24

KV heads

4

KV-bearing layers

16

Training scope

Built as a unified vision-language foundation with pre-training and post-training on multimodal tokens rather than a separate late-fusion stack.

Hybrid layout

16 of 64 layers use gated attention while the rest use Gated DeltaNet blocks, so the stack is not a full-attention transformer end to end.

Context design

Published with a native 262K context window and an architecture intended to stretch beyond that range in longer-context settings.

Research highlight

What improved

Agentic coding upgrade

Qwen3.6 is framed around better coding-agent behavior, especially frontend workflows and repository-level reasoning rather than a broad architecture reset.

Thinking preservation

The release adds an option to preserve reasoning context across prior messages, which matters for iterative development workflows and multi-turn tool use.

Stability over novelty

Qwen presents 3.6 as the first open-weight follow-up to Qwen3.5 built from community feedback, with more emphasis on dependable real-world utility than on introducing a new model family.

Training and release context

How it was released

Release lineage

Qwen3.6 is a direct successor to the February Qwen3.5 series rather than a separate architecture branch, and it keeps the same unified multimodal release format.

Architecture continuity

The line still uses the hybrid DeltaNet-plus-attention recipe, so the serving geometry stays governed by partial KV layers plus static sequence state rather than by full-attention on every layer.

Deployment target

Qwen explicitly packages the release for Transformers, vLLM, SGLang, and related serving stacks, which signals an operationally mature release rather than a research-only drop.

Where it is strong

Where it is strong

Coding agents

The line is tuned most visibly for repository work, frontend changes, tool use, and multi-step coding-agent flows.

Iterative reasoning

Thinking preservation makes the release better suited to long back-and-forth development sessions where reasoning context should not be rebuilt from scratch every turn.

Long-context hybrid serving

It keeps the hybrid long-context advantage of Qwen3.5 while shifting the capability story toward developer productivity and stability.

Memory behavior

What dominates VRAM

This text-only estimate still keeps the resident multimodal checkpoint weights on card, so the floor is higher than a pure language-only artifact of similar active size.

Only 16 of 64 layers carry a standard KV cache. The remaining layers contribute a fixed sequence-state term instead, which makes long-context growth less aggressive than a dense full-attention stack.

Longer context and higher concurrency still increase memory monotonically, but more of the footprint shifts into mixed KV-plus-state behavior instead of pure transformer cache expansion.

FitMyGPU currently treats this as a text-only estimate. Resident multimodal weights remain counted, but media-token overhead is excluded.

Sources

Where this page is grounded