Model notes
GPT-OSS 120B
Largest GPT-OSS checkpoint in the current registry, built for higher-capacity open reasoning with a much larger resident expert pool.
117B total • 5.1B active • 128,000 context • 8 KV heads
Architecture
Model spec
Architecture
Total params
Active params
Layers
Hidden size
Attention heads
KV heads
KV-bearing layers
Context length
Modality
License
Why it matters
Why memory behaves this way
Research highlight
Each MoE block has 128 experts with top-4 routing, and the larger model keeps the alternating full and sliding-window attention recipe while staying near 5.1B active params per token.
Memory note
More than 90% of GPT-OSS 120B's parameters sit in MXFP4-quantized MoE weights, while the remaining shared weights stay in BF16.
Checkpoints
Official profiles
Mixed MXFP4 + BF16 checkpoint
BF16 checkpoint
OpenAI's GPT-OSS model card lists a 60.8 GiB checkpoint for gpt-oss-120b. The estimator uses that published mixed MXFP4 + BF16 resident checkpoint size directly.
Sources