FitMyGPU
Back to calculator

Qwen

Qwen 2.5 1.5B

Instruction-tuned 1.5B Qwen2.5 model for lightweight coding, math, structured-output, and assistant tasks in a small dense deployment footprint.

Overview and architecture

What it is

Company

Qwen

Family

Qwen

Release date

Sep 17, 2024

Architecture

Dense decoder-only transformer

License

Apache 2.0

Modality

Text

Context window

32,768

Total params

1.5B

Active params

Dense model

Layers

28

Hidden size

1,536

Attention heads

12

KV heads

2

KV-bearing layers

28

Research highlight

What improved

Small but more capable than 0.5B

The 1.5B model is the first Qwen2.5 size where users often expect a more useful general assistant while still staying in a very lightweight VRAM class.

Coding and math uplift

Qwen still frames the family around stronger coding and mathematics than Qwen2, which matters more here because 1.5B is often used as a practical small local model.

Structured-output support

JSON and structured-data handling remain part of the product story rather than being reserved only for the larger checkpoints.

Training and release context

How it was released

Family release

Qwen2.5 was released as a broad language-model line spanning base and instruction-tuned checkpoints from 0.5B to 72B parameters.

Model architecture

The 1.5B instruct model is a causal language model built as a dense transformer with RoPE, SwiGLU, RMSNorm, attention QKV bias, and tied word embeddings.

1.5B model geometry

The checkpoint has 1.54B total parameters, 1.31B non-embedding parameters, 28 layers, 12 query heads, 2 KV heads, a 32,768-token context window, and up to 8,192 generated tokens.

Training stage

Qwen describes the release as a pretraining plus post-training model rather than a small instruction-only adaptation on top of an older base.

Where it is strong

Where it is strong

Small general assistant use

Useful when you want a more capable lightweight assistant model than 0.5B without moving all the way to 7B-class memory costs.

Structured outputs

A reasonable fit for lighter JSON, extraction, and formatting workflows on small hardware.

Small coding and math tasks

Good for modest technical and code-oriented tasks when a very small open model is required.

Memory behavior

What dominates VRAM

Resident weights are still modest at this size, so long context and runtime overhead matter proportionally more than on mid-size dense checkpoints.

Sources

Where this page is grounded