Mistral · 9 min read

Mistral Large 123B locally: which rig, what real cost in 2026

Damien · LocalIA

Published 2026-05-12

Mistral Large 123B open-weight at home: VRAM by quant, minimum rig (2x A6000 NVLink), ROI vs Mistral API by monthly volume, and when to prefer Llama 3.3 70B instead.

Mistral Large 123B is the French open-weight flagship of 2025: 128k context, native multilingual, GPT-4-class quality in code and reasoning. Running it locally — so without sending a single token to Saint-Denis — is possible, but not trivial. Here is what it really costs, and which rig is on target.

How much VRAM for Mistral Large 123B

The model is dense (not a MoE): 123 billion parameters all active per token. VRAM depends on the quantization chosen.

FP16	~246 GB	Reference quality, 4× A100 80 GB (datacenter)
Q8_0	~131 GB	Near-FP16, 2× A6000 + offload or 4× RTX 5090
Q5_K_M	~88 GB	Indistinguishable in chat, 2× A6000 NVLink (96 GB)
Q4_K_M	~70 GB	Sweet spot, 2× A6000 NVLink or 3× RTX 5090
Q3_K_M	~52 GB	Degraded on reasoning, 2× RTX 5090 (64 GB)

The recommended rig: Enterprise (Q4) or custom (Q8+)

The LocalIA Enterprise rig (2× RTX A6000 NVLink, 96 GB VRAM, from EUR 25,990 ex VAT) is the natural target for Mistral Large 123B in Q4_K_M. NVLink-shared 96 GB VRAM are enough for the model plus a comfortable 32k context. Throughput around 22 tok/s single-user, ~45 tok/s with vLLM batching at 4 concurrent requests.

Real ROI vs Mistral API

Mistral Large via the official Mistral AI API costs ~EUR 8 per million tokens (blended 80 in / 20 out).

30M tok/month	~EUR 2,880/year API	Break-even Enterprise rig ≈ 9 months
100M tok/month	~EUR 9,600/year API	Break-even ≈ 3 months
300M tok/month	~EUR 28,800/year API	Break-even ≈ 1 month
1,000M tok/month	~EUR 96,000/year API	Break-even a few weeks

When Mistral Large is worth it vs Llama 3.3 70B

Premium native multilingual (FR/ES/DE/IT at native level vs Llama's English center).
Code in niche languages (R, COBOL, Fortran).
French sovereignty: open-weight from an EU company under EU law.
Native 128k context with no rolling-window degradation.

Mistral Large 123B locally is justified from 100M tok/month with a French-speaking target. Below that, Llama 3.3 70B on a Pro rig (2× RTX 5090, EUR 11,990) covers 90% of cases at half the ticket. Above 300M tok/month, the Enterprise rig pays back in weeks.

Open the calculator / request a quote with your target model, users and constraints.

MistralSovereigntyRig

X Reddit LinkedIn