Company
DeepSeek
DeepSeek-family reasoning and efficiency-focused releases, especially where architecture changes affect memory behavior.
Start here
Latest model
Series
DeepSeek R1 Distill Llama
DeepSeek R1 Distill Llama 70B
Largest dense DeepSeek R1 distill in this batch, carrying R1-style reasoning into a 70B Llama serving target without the full DeepSeek-R1 MoE complexity.
70B dense • 131,072 context • 8 KV heads
DeepSeek R1 Distill Llama 8B
DeepSeek R1 distill on a Llama dense backbone, giving users a familiar 8B serving shape with stronger reasoning-oriented post-training.
8B dense • 131,072 context • 8 KV heads
Series
DeepSeek R1 Distill Qwen
DeepSeek R1 Distill Qwen 32B
Largest Qwen-backed DeepSeek R1 distill, designed for strong dense reasoning without the operational complexity of the full DeepSeek-R1 MoE model.
32B dense • 131,072 context • 8 KV heads
DeepSeek R1 Distill Qwen 14B
Higher-capacity Qwen-backed DeepSeek R1 distill for stronger reasoning and coding without leaving the standard dense serving path.
14B dense • 131,072 context • 8 KV heads
DeepSeek R1 Distill Qwen 7B
Mid-sized DeepSeek R1 distill on a Qwen dense backbone, aimed at practical local reasoning without the huge footprint of the full R1 model.
7B dense • 131,072 context • 4 KV heads
DeepSeek R1 Distill Qwen 1.5B
Smallest DeepSeek R1 distill, carrying R1-style reasoning into a compact Qwen dense backbone that is easy to run locally.
1.5B dense • 131,072 context • 2 KV heads