Models
Browse the model registry
Start with the newest or most useful models, browse by company, or jump to the full index below.
Start here
Models to explore first
OpenAI
GPT-OSS 120B
Production GPT-OSS release for general-purpose and higher-reasoning workloads that can fit on a single 80 GB class GPU.
117B total • 5.1B active • 128,000 context • 8 KV heads
Qwen
Qwen 3.5 9B
Largest practical Qwen3.5 release for this batch, pairing a 9B language model with a resident multimodal stack that still targets single-GPU text serving.
10B dense • 262,144 context • 4 KV heads
NVIDIA
OpenReasoning Nemotron 14B
Mid-sized dense Nemotron checkpoint for users who want stronger reasoning behavior than 7B without stepping straight into 32B deployment territory.
14.7B dense • 131,072 context • 8 KV heads
Meta Llama
Llama 3.1 70B
High-capacity dense Llama model that is common in serious long-context inference and fine-tuning work.
70.6B dense • 131,072 context • 8 KV heads
Qwen
Qwen 2.5 32B
Instruction-tuned 32B Qwen2.5 model for higher-capacity long-context, coding, math, and structured-output workloads in a large dense deployment shape.
32.5B dense • 131,072 context • 8 KV heads
OpenAI
GPT-OSS 20B
Smaller GPT-OSS release for general-purpose and reasoning use cases that need to stay within a much lighter single-card memory budget.
21B total • 3.6B active • 128,000 context • 8 KV heads
Companies
Browse by source
Qwen
Qwen dense, hybrid, and multimodal families with frequent open checkpoint releases.
27 models covered
DeepSeek
DeepSeek-family reasoning and efficiency-focused releases, especially where architecture changes affect memory behavior.
6 models covered
NVIDIA
Nemotron and related inference-oriented open models, often tied closely to deployment runtimes.
4 models covered
Gemma
Google Gemma coverage placeholder for future long-context and compact-model releases.
2 models covered
Meta Llama
Llama-family dense transformer models that set a practical baseline for open inference planning.
2 models covered
Mistral
Dense and MoE Mistral-family models commonly used in open inference stacks.
2 models covered
OpenAI
Open-weight and hosted model releases with strong inference interest and frequent runtime discussion.
2 models covered
Microsoft Phi
Compact Phi-family models with practical local inference interest.
1 models covered
Index