Mistral · 9 min di lettura

Mistral Large 123B in locale: quale rig, quale costo reale nel 2026

Damien · LocalIA

Pubblicato 2026-05-12

Mistral Large 123B open-weight a casa: VRAM per quant, rig minimo (2x A6000 NVLink), ROI vs Mistral API per volume mensile, e quando preferire Llama 3.3 70B.

Articolo tradotto. Questa versione e localizzata per evitare pagine internazionali con testo francese. Dati tecnici, prezzi e raccomandazioni restano invariati.

How much VRAM for Mistral Large 123B

The model is dense (not a MoE): 123 billion parameters all active per token. VRAM depends on the quantization chosen.

FP16	~246 GB	Reference quality, 4× A100 80 GB (datacenter)
Q8_0	~131 GB	Near-FP16, 2× A6000 + offload or 4× RTX 5090
Q5_K_M	~88 GB	Indistinguishable in chat, 2× A6000 NVLink (96 GB)
Q4_K_M	~70 GB	Sweet spot, 2× A6000 NVLink or 3× RTX 5090
Q3_K_M	~52 GB	Degraded on reasoning, 2× RTX 5090 (64 GB)

The recommended rig: Enterprise (Q4) or custom (Q8+)

The LocalIA Enterprise rig (2× RTX A6000 NVLink, 96 GB VRAM, from EUR 25,990 ex VAT) is the natural target for Mistral Large 123B in Q4_K_M. NVLink-shared 96 GB VRAM are enough for the model plus a comfortable 32k context. Throughput around 22 tok/s single-user, ~45 tok/s with vLLM batching at 4 concurrent requests.

Real ROI vs Mistral API

Mistral Large via the official Mistral AI API costs ~EUR 8 per million tokens (blended 80 in / 20 out).

30M tok/month	~EUR 2,880/year API	Break-even Enterprise rig ≈ 9 months
100M tok/month	~EUR 9,600/year API	Break-even ≈ 3 months
300M tok/month	~EUR 28,800/year API	Break-even ≈ 1 month
1,000M tok/month	~EUR 96,000/year API	Break-even a few weeks

When Mistral Large is worth it vs Llama 3.3 70B

Premium native multilingual (FR/ES/DE/IT at native level vs Llama's English center).
Code in niche languages (R, COBOL, Fortran).
French sovereignty: open-weight from an EU company under EU law.
Native 128k context with no rolling-window degradation.

Mistral Large 123B locally is justified from 100M tok/month with a French-speaking target. Below that, Llama 3.3 70B on a Pro rig (2× RTX 5090, EUR 11,990) covers 90% of cases at half the ticket. Above 300M tok/month, the Enterprise rig pays back in weeks.

Apri il calcolatore / richiedi un preventivo con modello target, utenti e vincoli.

MistralSovranitaRig

X Reddit LinkedIn