Back to calculator

Model notes

GPT-OSS 20B

Smaller GPT-OSS reasoning checkpoint with a routed MoE stack, 128K context, and a relatively light active path.

21B total • 3.6B active • 128,000 context • 8 KV heads

Architecture

Model spec

Architecture

Mixture-of-experts transformer

Total params

21B

Active params

3.6B

Layers

24

Hidden size

2,880

Attention heads

64

KV heads

8

KV-bearing layers

24

Context length

128,000

Modality

Text

License

Apache 2.0

Why it matters

Why memory behaves this way

Research highlight

Each MoE block has 32 experts with top-4 routing, and the stack alternates full and sliding-window attention to keep long-context reasoning practical.

Memory note

More than 90% of GPT-OSS 20B's parameters sit in MoE weights quantized to MXFP4, while the remaining shared weights stay in BF16.

Checkpoints

Official profiles

Mixed MXFP4 + BF16 checkpoint

BF16 checkpoint

Current

OpenAI's GPT-OSS model card lists a 12.8 GiB checkpoint for gpt-oss-20b. The estimator uses that published mixed MXFP4 + BF16 resident checkpoint size directly.

vLLMTransformers
Open checkpoint

Sources

Reference links