Company

DeepSeek

DeepSeek-family reasoning and efficiency-focused releases, especially where architecture changes affect memory behavior.

Start here

Latest model

DeepSeek R1 Distill

DeepSeek R1 Distill Llama 70B

Largest dense DeepSeek R1 distill in this batch, carrying R1-style reasoning into a 70B Llama serving target without the full DeepSeek-R1 MoE complexity.

70B dense • 131,072 context • 8 KV heads

Series

DeepSeek R1 Distill Llama

DeepSeek R1 Distill Llama 70B

Largest dense DeepSeek R1 distill in this batch, carrying R1-style reasoning into a 70B Llama serving target without the full DeepSeek-R1 MoE complexity.

70B dense • 131,072 context • 8 KV heads

DeepSeek R1 Distill Llama 8B

DeepSeek R1 distill on a Llama dense backbone, giving users a familiar 8B serving shape with stronger reasoning-oriented post-training.

8B dense • 131,072 context • 8 KV heads

Series

DeepSeek

Latest model

DeepSeek R1 Distill Llama 70B

DeepSeek R1 Distill Llama

DeepSeek R1 Distill Llama 70B

DeepSeek R1 Distill Llama 8B

DeepSeek R1 Distill Qwen

DeepSeek R1 Distill Qwen 32B

DeepSeek R1 Distill Qwen 14B

DeepSeek R1 Distill Qwen 7B

DeepSeek R1 Distill Qwen 1.5B