Mistral Large 123B lokal: welches Rig, welche realen Kosten in 2026
Mistral Large 123B open-weight zuhause: VRAM pro Quant, Mindest-Rig (2x A6000 NVLink), ROI vs Mistral API nach Monatsvolumen, und wann Llama 3.3 70B die bessere Wahl ist.

Uebersetzter Artikel. Diese Version ist lokalisiert, damit internationale Seiten keinen franzoesischen Artikeltext anzeigen. Technische Daten, Preise und Empfehlungen bleiben gleich.
How much VRAM for Mistral Large 123B
The model is dense (not a MoE): 123 billion parameters all active per token. VRAM depends on the quantization chosen.
| FP16 | ~246 GB | Reference quality, 4× A100 80 GB (datacenter) |
| Q8_0 | ~131 GB | Near-FP16, 2× A6000 + offload or 4× RTX 5090 |
| Q5_K_M | ~88 GB | Indistinguishable in chat, 2× A6000 NVLink (96 GB) |
| Q4_K_M | ~70 GB | Sweet spot, 2× A6000 NVLink or 3× RTX 5090 |
| Q3_K_M | ~52 GB | Degraded on reasoning, 2× RTX 5090 (64 GB) |
The recommended rig: Enterprise (Q4) or custom (Q8+)
The LocalIA Enterprise rig (2× RTX A6000 NVLink, 96 GB VRAM, from EUR 25,990 ex VAT) is the natural target for Mistral Large 123B in Q4_K_M. NVLink-shared 96 GB VRAM are enough for the model plus a comfortable 32k context. Throughput around 22 tok/s single-user, ~45 tok/s with vLLM batching at 4 concurrent requests.
Real ROI vs Mistral API
Mistral Large via the official Mistral AI API costs ~EUR 8 per million tokens (blended 80 in / 20 out).
| 30M tok/month | ~EUR 2,880/year API | Break-even Enterprise rig ≈ 9 months |
| 100M tok/month | ~EUR 9,600/year API | Break-even ≈ 3 months |
| 300M tok/month | ~EUR 28,800/year API | Break-even ≈ 1 month |
| 1,000M tok/month | ~EUR 96,000/year API | Break-even a few weeks |
When Mistral Large is worth it vs Llama 3.3 70B
- Premium native multilingual (FR/ES/DE/IT at native level vs Llama's English center).
- Code in niche languages (R, COBOL, Fortran).
- French sovereignty: open-weight from an EU company under EU law.
- Native 128k context with no rolling-window degradation.
Mistral Large 123B locally is justified from 100M tok/month with a French-speaking target. Below that, Llama 3.3 70B on a Pro rig (2× RTX 5090, EUR 11,990) covers 90% of cases at half the ticket. Above 300M tok/month, the Enterprise rig pays back in weeks.
Rechner öffnen / Angebot anfragen mit Zielmodell, Nutzern und Randbedingungen.