Mistral Large 123B in locale: quale rig, quale costo reale nel 2026
Mistral Large 123B open-weight a casa: VRAM per quant, rig minimo (2x A6000 NVLink), ROI vs Mistral API per volume mensile, e quando preferire Llama 3.3 70B.

Articolo tradotto. Questa versione e localizzata per evitare pagine internazionali con testo francese. Dati tecnici, prezzi e raccomandazioni restano invariati.
How much VRAM for Mistral Large 123B
The model is dense (not a MoE): 123 billion parameters all active per token. VRAM depends on the quantization chosen.
| FP16 | ~246 GB | Reference quality, 4× A100 80 GB (datacenter) |
| Q8_0 | ~131 GB | Near-FP16, 2× A6000 + offload or 4× RTX 5090 |
| Q5_K_M | ~88 GB | Indistinguishable in chat, 2× A6000 NVLink (96 GB) |
| Q4_K_M | ~70 GB | Sweet spot, 2× A6000 NVLink or 3× RTX 5090 |
| Q3_K_M | ~52 GB | Degraded on reasoning, 2× RTX 5090 (64 GB) |
The recommended rig: Enterprise (Q4) or custom (Q8+)
The LocalIA Enterprise rig (2× RTX A6000 NVLink, 96 GB VRAM, from EUR 25,990 ex VAT) is the natural target for Mistral Large 123B in Q4_K_M. NVLink-shared 96 GB VRAM are enough for the model plus a comfortable 32k context. Throughput around 22 tok/s single-user, ~45 tok/s with vLLM batching at 4 concurrent requests.
Real ROI vs Mistral API
Mistral Large via the official Mistral AI API costs ~EUR 8 per million tokens (blended 80 in / 20 out).
| 30M tok/month | ~EUR 2,880/year API | Break-even Enterprise rig ≈ 9 months |
| 100M tok/month | ~EUR 9,600/year API | Break-even ≈ 3 months |
| 300M tok/month | ~EUR 28,800/year API | Break-even ≈ 1 month |
| 1,000M tok/month | ~EUR 96,000/year API | Break-even a few weeks |
When Mistral Large is worth it vs Llama 3.3 70B
- Premium native multilingual (FR/ES/DE/IT at native level vs Llama's English center).
- Code in niche languages (R, COBOL, Fortran).
- French sovereignty: open-weight from an EU company under EU law.
- Native 128k context with no rolling-window degradation.
Mistral Large 123B locally is justified from 100M tok/month with a French-speaking target. Below that, Llama 3.3 70B on a Pro rig (2× RTX 5090, EUR 11,990) covers 90% of cases at half the ticket. Above 300M tok/month, the Enterprise rig pays back in weeks.
Apri il calcolatore / richiedi un preventivo con modello target, utenti e vincoli.