Models

Browse the model registry

Start with the newest or most useful models, browse by company, or jump to the full index below.

Start here

Models to explore first

OpenAI

GPT-OSS 120B

Production GPT-OSS release for general-purpose and higher-reasoning workloads that can fit on a single 80 GB class GPU.

117B total • 5.1B active • 128,000 context • 8 KV heads

Qwen

Qwen 3.5 9B

Largest practical Qwen3.5 release for this batch, pairing a 9B language model with a resident multimodal stack that still targets single-GPU text serving.

10B dense • 262,144 context • 4 KV heads

NVIDIA

OpenReasoning Nemotron 14B

Mid-sized dense Nemotron checkpoint for users who want stronger reasoning behavior than 7B without stepping straight into 32B deployment territory.

14.7B dense • 131,072 context • 8 KV heads

Meta Llama

Llama 3.1 70B

High-capacity dense Llama model that is common in serious long-context inference and fine-tuning work.

70.6B dense • 131,072 context • 8 KV heads

Qwen

Qwen 2.5 32B

Instruction-tuned 32B Qwen2.5 model for higher-capacity long-context, coding, math, and structured-output workloads in a large dense deployment shape.

32.5B dense • 131,072 context • 8 KV heads

OpenAI

GPT-OSS 20B

Smaller GPT-OSS release for general-purpose and reasoning use cases that need to stay within a much lighter single-card memory budget.

21B total • 3.6B active • 128,000 context • 8 KV heads

Companies

Browse by source

Qwen

Qwen dense, hybrid, and multimodal families with frequent open checkpoint releases.

27 models covered

DeepSeek

DeepSeek-family reasoning and efficiency-focused releases, especially where architecture changes affect memory behavior.

6 models covered

NVIDIA

Nemotron and related inference-oriented open models, often tied closely to deployment runtimes.

4 models covered

Gemma

Google Gemma coverage placeholder for future long-context and compact-model releases.

2 models covered

Meta Llama

Llama-family dense transformer models that set a practical baseline for open inference planning.

2 models covered

Mistral

Dense and MoE Mistral-family models commonly used in open inference stacks.

2 models covered

OpenAI

Open-weight and hosted model releases with strong inference interest and frequent runtime discussion.

2 models covered

Microsoft Phi

Compact Phi-family models with practical local inference interest.

1 models covered

Index

All models

ModelCompanyAt a glance

GPT-OSS 20BOpenAI21B total • 3.6B active • 128,000 context • 8 KV heads GPT-OSS 120BOpenAI117B total • 5.1B active • 128,000 context • 8 KV heads Llama 3.1 8BMeta Llama8B dense • 131,072 context • 8 KV heads Llama 3.1 70BMeta Llama70.6B dense • 131,072 context • 8 KV heads Qwen 2.5 0.5BQwen490M dense • 32,768 context • 2 KV heads Qwen 2.5 1.5BQwen1.5B dense • 32,768 context • 2 KV heads Qwen 2.5 3BQwen3.1B dense • 32,768 context • 2 KV heads Qwen 2.5 7BQwen7.6B dense • 131,072 context • 4 KV heads Qwen 2.5 14BQwen14.7B dense • 131,072 context • 8 KV heads Qwen 2.5 32BQwen32.5B dense • 131,072 context • 8 KV heads Qwen 2.5 72BQwen72.7B dense • 131,072 context • 8 KV heads Qwen 3 0.6BQwen600M dense • 32,768 context • 8 KV heads Qwen 3 1.7BQwen1.7B dense • 32,768 context • 8 KV heads Qwen 3 4BQwen4B dense • 131,072 context • 8 KV heads Qwen 3 8BQwen8.2B dense • 131,072 context • 8 KV heads Qwen 3 14BQwen14.8B dense • 131,072 context • 8 KV heads Qwen 3 32BQwen32.8B dense • 131,072 context • 8 KV heads Qwen 3 30B A3BQwen30.5B total • 3.3B active • 131,072 context • 4 KV heads Qwen 3 235B A22BQwen235B total • 22B active • 131,072 context • 4 KV heads Qwen 3 4B Thinking 2507Qwen4B dense • 262,144 context • 8 KV heads Qwen 3 30B A3B Instruct 2507Qwen30.5B total • 3.3B active • 262,144 context • 4 KV heads Qwen 3.5 0.8BQwen900M dense • 262,144 context • 2 KV heads Qwen 3.5 2BQwen2B dense • 262,144 context • 2 KV heads Qwen 3.5 4BQwen5B dense • 262,144 context • 4 KV heads Qwen 3.5 9BQwen10B dense • 262,144 context • 4 KV heads Qwen 3.5 27BQwen27B dense • 262,144 context • 4 KV heads Qwen 3.5 35B A3BQwen35B total • 3B active • 262,144 context • 2 KV heads Qwen 3.5 122B A10BQwen122B total • 10B active • 262,144 context • 2 KV heads Qwen 3.5 397B A17BQwen397B total • 17B active • 262,144 context • 2 KV heads Qwen 3.6 27BQwen27B dense • 262,144 context • 4 KV heads Qwen 3.6 35B A3BQwen35B total • 3B active • 262,144 context • 2 KV heads DeepSeek R1 Distill Qwen 1.5BDeepSeek1.5B dense • 131,072 context • 2 KV heads DeepSeek R1 Distill Qwen 7BDeepSeek7B dense • 131,072 context • 4 KV heads DeepSeek R1 Distill Qwen 14BDeepSeek14B dense • 131,072 context • 8 KV heads DeepSeek R1 Distill Qwen 32BDeepSeek32B dense • 131,072 context • 8 KV heads DeepSeek R1 Distill Llama 8BDeepSeek8B dense • 131,072 context • 8 KV heads DeepSeek R1 Distill Llama 70BDeepSeek70B dense • 131,072 context • 8 KV heads OpenReasoning Nemotron 1.5BNVIDIA1.5B dense • 32,768 context • 2 KV heads OpenReasoning Nemotron 7BNVIDIA7.6B dense • 131,072 context • 4 KV heads OpenReasoning Nemotron 14BNVIDIA14.7B dense • 131,072 context • 8 KV heads OpenReasoning Nemotron 32BNVIDIA32.5B dense • 131,072 context • 8 KV heads Gemma 2 9BGemma9.2B dense • 8,192 context • 8 KV heads Gemma 2 27BGemma27B dense • 8,192 context • 16 KV heads Mistral Nemo 12BMistral12.2B dense • 128,000 context • 8 KV heads Mixtral 8x7BMistral46.7B total • 12.9B active • 32,768 context • 8 KV heads Phi-4 14BMicrosoft Phi14.7B dense • 16,384 context • 10 KV heads