OpenAI
GPT-OSS 20B
Smaller GPT-OSS release for general-purpose and reasoning use cases that need to stay within a much lighter single-card memory budget.
Overview and architecture
What it is
Company
Family
Release date
Architecture
License
Modality
Context window
Total params
Active params
Layers
Hidden size
Attention heads
KV heads
KV-bearing layers
Research highlight
What improved
Low-memory GPT-OSS entry point
The main release-level change is that GPT-OSS capability becomes practical in roughly 16 GB of memory rather than requiring an 80 GB class accelerator.
Configurable reasoning effort
Like the larger model, gpt-oss-20b supports low, medium, and high reasoning effort settings so latency and reasoning depth can be traded off per use case.
Native agent features
The smaller release still keeps the same first-class agent surface: function calling, web browsing, Python execution, and structured outputs.
Full chain-of-thought access
OpenAI exposes the reasoning trace for debugging and trust, even though it is not intended for direct end-user display.
Training and release context
How it was released
Harmony-only format
Both GPT-OSS models were trained on OpenAI's Harmony response format and are expected to be used with that format rather than a generic chat template.
Model geometry
gpt-oss-20b uses 24 layers, 21B total parameters, 3.6B active parameters per token, 32 total experts, 4 active experts per token, and a 128K context window.
Quantized MoE release
The MoE weights were post-trained in MXFP4, which is the release decision that makes the smaller checkpoint practical in roughly 16 GB of memory.
Training data and tokenizer
OpenAI describes the training mix as mostly English, text-only data with emphasis on STEM, coding, and general knowledge, tokenized with the open-sourced o200k_harmony tokenizer.
Where it is strong
Where it is strong
Smaller-memory deployment
Best fit when you want GPT-OSS reasoning and agent behavior without stepping into 80 GB class hardware first.
General-purpose assistant work
Designed as a broad open assistant and reasoning model rather than a narrow specialist checkpoint.
Fine-tuning and customization
OpenAI positions the model as fine-tunable, which makes it useful when a smaller open reasoning model needs to be adapted to a specific task.
Commercial deployment
The Apache 2.0 license keeps experimentation and product deployment straightforward for teams that want permissive usage terms.
Memory behavior
What dominates VRAM
More than 90% of GPT-OSS 20B's parameters sit in MoE weights quantized to MXFP4, while the remaining shared weights stay in BF16.
Sources