FitMyGPU
Back to calculator

Qwen

Qwen 3.5 27B

Large dense Qwen3.5 release that keeps the hybrid multimodal stack but pushes into a much heavier single-model serving class than the 9B tier.

Overview and architecture

What it is

Company

Qwen

Family

Qwen

Release date

Feb 24, 2026

Architecture

Hybrid multimodal transformer

License

Apache 2.0

Modality

Multimodal (text-only estimate)

Context window

262,144

Total params

27B

Active params

Dense model

Layers

64

Hidden size

5,120

Attention heads

24

KV heads

4

KV-bearing layers

16

Training scope

Built as a unified vision-language foundation with pre-training and post-training on multimodal tokens rather than a separate late-fusion stack.

Hybrid layout

16 of 64 layers use gated attention while the rest use Gated DeltaNet blocks, so the stack is not a full-attention transformer end to end.

Context design

Published with a native 262K context window and an architecture intended to stretch beyond that range in longer-context settings.

Research highlight

What improved

Unified vision-language foundation

The family is trained as one multimodal base rather than as separate text and vision branches bolted together late, which is why text-only serving still keeps the resident vision-side weights on card.

Efficient hybrid architecture

Gated DeltaNet layers carry sequence state while periodic gated-attention layers handle KV-heavy reasoning, so the stack aims for long-context throughput without paying dense-attention KV cost on every layer.

Scalable RL generalization

Qwen frames reinforcement learning and large agent-environment scaling as core to the family, with training aimed at more robust adaptation across reasoning, coding, and agent workflows.

Global coverage

The release emphasizes support for 201 languages and dialects, which matters for deployment quality and reinforces that the family is meant as a broad general-purpose foundation.

Training infrastructure

The release emphasizes near-text-only multimodal training efficiency and asynchronous RL infrastructure, signaling that the stack was built to scale rather than as a small multimodal add-on.

Training and release context

How it was released

Unified release format

Qwen3.5 is released as a single multimodal foundation rather than as separate text and vision checkpoints stitched together later.

Architecture shift

The family changes the serving geometry by mixing DeltaNet-style state layers with periodic attention layers instead of staying a plain dense-attention stack like Qwen2.5.

Training stack

Qwen emphasizes multimodal training efficiency and large-scale RL infrastructure as part of the release process, not just as a benchmark claim.

Where it is strong

Where it is strong

Multimodal reasoning

Designed for a unified text-plus-vision capability profile rather than separate specialist variants.

Long-context serving

The hybrid layout is explicitly aimed at making long-context serving cheaper than a dense full-attention stack.

Agents and coding

Qwen positions the family as competitive across coding, reasoning, and agent-style workflows.

Memory behavior

What dominates VRAM

This text-only estimate still keeps the resident multimodal checkpoint weights on card, so the floor is higher than a pure language-only artifact of similar active size.

Only 16 of 64 layers carry a standard KV cache. The remaining layers contribute a fixed sequence-state term instead, which makes long-context growth less aggressive than a dense full-attention stack.

Longer context and higher concurrency still increase memory monotonically, but more of the footprint shifts into mixed KV-plus-state behavior instead of pure transformer cache expansion.

FitMyGPU currently treats this as a text-only estimate. Resident multimodal weights remain counted, but media-token overhead is excluded.

Sources

Where this page is grounded