Model notes

GPT-OSS 20B

Smaller GPT-OSS reasoning checkpoint with a routed MoE stack, 128K context, and a relatively light active path.

21B total • 3.6B active • 128,000 context • 8 KV heads

Open base model Open selected checkpoint

Architecture

Model spec

Architecture

Mixture-of-experts transformer

Total params

21B

Active params

3.6B

Layers

Hidden size

2,880

Attention heads

KV heads

KV-bearing layers

Context length

128,000

Modality

Text

License

Apache 2.0

Why it matters

Why memory behaves this way

Research highlight

Each MoE block has 32 experts with top-4 routing, and the stack alternates full and sliding-window attention to keep long-context reasoning practical.

Memory note

More than 90% of GPT-OSS 20B's parameters sit in MoE weights quantized to MXFP4, while the remaining shared weights stay in BF16.

Checkpoints

Official profiles

Mixed MXFP4 + BF16 checkpoint

BF16 checkpoint

Current

OpenAI's GPT-OSS model card lists a 12.8 GiB checkpoint for gpt-oss-20b. The estimator uses that published mixed MXFP4 + BF16 resident checkpoint size directly.

vLLMTransformers

Open checkpoint

Sources

Reference links

https://openai.com/open-modelsopen https://huggingface.co/openai/gpt-oss-20bopen