Model notes

GPT-OSS 120B

Largest GPT-OSS checkpoint in the current registry, built for higher-capacity open reasoning with a much larger resident expert pool.

117B total • 5.1B active • 128,000 context • 8 KV heads

Open base model Open selected checkpoint

Architecture

Model spec

Architecture

Mixture-of-experts transformer

Total params

117B

Active params

5.1B

Layers

Hidden size

2,880

Attention heads

KV heads

KV-bearing layers

Context length

128,000

Modality

Text

License

Apache 2.0

Why it matters

Why memory behaves this way

Research highlight

Each MoE block has 128 experts with top-4 routing, and the larger model keeps the alternating full and sliding-window attention recipe while staying near 5.1B active params per token.

Memory note

More than 90% of GPT-OSS 120B's parameters sit in MXFP4-quantized MoE weights, while the remaining shared weights stay in BF16.

Checkpoints

Official profiles

Mixed MXFP4 + BF16 checkpoint

BF16 checkpoint

Current

OpenAI's GPT-OSS model card lists a 60.8 GiB checkpoint for gpt-oss-120b. The estimator uses that published mixed MXFP4 + BF16 resident checkpoint size directly.

vLLMTransformers

Open checkpoint

Sources

Reference links

https://openai.com/open-modelsopen https://huggingface.co/openai/gpt-oss-120bopen