Mistral · 9 min lezen

Mistral Large 123B lokaal: welke rig, wat zijn de echte kosten in 2026

Damien · LocalIA

Gepubliceerd 2026-05-12

Mistral Large 123B open-weight thuis: VRAM per quant, minimale rig (2x A6000 NVLink), ROI vs Mistral API per maandvolume, en wanneer kies je beter voor Llama 3.3 70B.

Vertaald artikel. Deze versie is gelokaliseerd zodat internationale pagina's geen Franse artikeltekst tonen. Technische data, prijzen en adviezen blijven gelijk.

How much VRAM for Mistral Large 123B

The model is dense (not a MoE): 123 billion parameters all active per token. VRAM depends on the quantization chosen.

FP16	~246 GB	Reference quality, 4× A100 80 GB (datacenter)
Q8_0	~131 GB	Near-FP16, 2× A6000 + offload or 4× RTX 5090
Q5_K_M	~88 GB	Indistinguishable in chat, 2× A6000 NVLink (96 GB)
Q4_K_M	~70 GB	Sweet spot, 2× A6000 NVLink or 3× RTX 5090
Q3_K_M	~52 GB	Degraded on reasoning, 2× RTX 5090 (64 GB)

The recommended rig: Enterprise (Q4) or custom (Q8+)

The LocalIA Enterprise rig (2× RTX A6000 NVLink, 96 GB VRAM, from EUR 25,990 ex VAT) is the natural target for Mistral Large 123B in Q4_K_M. NVLink-shared 96 GB VRAM are enough for the model plus a comfortable 32k context. Throughput around 22 tok/s single-user, ~45 tok/s with vLLM batching at 4 concurrent requests.

Real ROI vs Mistral API

Mistral Large via the official Mistral AI API costs ~EUR 8 per million tokens (blended 80 in / 20 out).

30M tok/month	~EUR 2,880/year API	Break-even Enterprise rig ≈ 9 months
100M tok/month	~EUR 9,600/year API	Break-even ≈ 3 months
300M tok/month	~EUR 28,800/year API	Break-even ≈ 1 month
1,000M tok/month	~EUR 96,000/year API	Break-even a few weeks

When Mistral Large is worth it vs Llama 3.3 70B

Premium native multilingual (FR/ES/DE/IT at native level vs Llama's English center).
Code in niche languages (R, COBOL, Fortran).
French sovereignty: open-weight from an EU company under EU law.
Native 128k context with no rolling-window degradation.

Mistral Large 123B locally is justified from 100M tok/month with a French-speaking target. Below that, Llama 3.3 70B on a Pro rig (2× RTX 5090, EUR 11,990) covers 90% of cases at half the ticket. Above 300M tok/month, the Enterprise rig pays back in weeks.

Open de calculator / vraag ons om advies met doelmodel, gebruikers en randvoorwaarden.

Veelgestelde vragen

How much does a rig to run Mistral Large 123B locally cost?+

From ~EUR 25,990 (an Enterprise reference build, 2x RTX A6000 NVLink, 96 GB VRAM) as an indicative cost. It runs Mistral Large 123B in Q5_K_M (88 GB) with a comfortable 32k context. For Q8 (131 GB) you need a custom multi-node setup.

Mistral Large 123B locally vs the Mistral API, is it worth it?+

Yes from 100M tokens/month. The Mistral API is ~EUR 8/M tokens blended. At 100M tokens/month = EUR 9,600/year. An Enterprise build pays back in ~3 months. At 30M tokens/month the break-even is ~9 months (still profitable).

How much VRAM for Mistral Large 123B in Q4?+

Around 70 GB in Q4_K_M. Possible setups: 2x RTX A6000 NVLink (96 GB), 2x RTX 6000 Ada (96 GB, faster), or 3x RTX 5090 (96 GB via tensor parallelism). 2x RTX 5090 (64 GB) is too tight without offload.

Mistral Large 123B or Llama 3.3 70B, which to choose?+

Llama 3.3 70B (a Pro build, ~EUR 11,990) is enough for 90% of cases. Mistral Large 123B (an Enterprise build, ~EUR 25,990) is justified for premium European multilingual (native FR/ES/DE/IT), code in rare languages, or absolute sovereignty (open weights from France).

What are Mistral Large's sovereignty advantages?+

Open weights from a French company under European law. No transfer outside the EU, no US export controls, GDPR/AI Act compliant from local deployment. A strong argument for public-sector, legal and medical markets.

MistralSouvereiniteitRig

X Reddit LinkedIn