GPU · 7 Min. Lesezeit

RTX 5090 vs RTX 4090 für lokale KI: welche 2026 wählen?

Damien · LocalIA

Veröffentlicht 2026-06-05

Der nüchterne Vergleich aus dem Blickwinkel, der zählt: welche Modelle laufen, bei welchem Durchsatz, zu welchem Preis. VRAM (32 vs 24 GB), GDDR7-Bandbreite, Modelltabelle und Urteil nach Profil.

Uebersetzter Artikel. Diese Version ist lokalisiert, damit internationale Seiten keinen franzoesischen Artikeltext anzeigen. Technische Daten, Preise und Empfehlungen bleiben gleich.

Key numbers

VRAM	24 GB GDDR6X	32 GB GDDR7
Memory bandwidth	~1,008 GB/s	~1,792 GB/s
TDP	450 W	575 W
Indicative new price 2026	~EUR 1,600-1,900	~EUR 2,200-2,700

The number that changes everything

The RTX 5090 has about 78% more memory bandwidth thanks to GDDR7. LLM inference is memory-bound: tokens per second depend far more on bandwidth than on raw compute. That, plus the 8 extra gigabytes, is where the 5090 pulls ahead.

Which models run on each

Mistral 7B / Llama 8B	FP16 yes	FP16 yes
Qwen 32B / Gemma 31B	Q4 yes, Q5 tight	Q5 yes, Q6 yes
Llama 3.3 70B / Qwen 72B	Q3 very tight	Q3 yes (Q4 out of reach)
Llama 4 Scout 109B (MoE)	Too big	Needs two cards

Throughput and value

On a model that fits both cards, the RTX 5090 delivers roughly 40 to 70% more tokens per second. The gap is most visible on multi-request use and long contexts.

The used RTX 4090 (around EUR 1,100-1,400) stays the best value if your models fit in 24 GB, you run solo or in a small team, and you do not need the 70B tier. The 5090 is worth it for 70B on a single card, long context, or serving several users.

Solo dev, models up to 32B: used RTX 4090 — unbeatable value.
You want Llama 70B on one card: RTX 5090 — the 4090 cannot follow.
Multi-user or long context: RTX 5090 — the GDDR7 bandwidth pays off.
100B+ models: multi-GPU (NVLink) — neither card alone.

Enter your card and target model in the LocalIA GPU to LLM calculator: it tells you whether it fits, at which quantization, with estimated VRAM and throughput, across 200+ GPUs. Free, no signup. LocalIA is an independent resource and sells nothing.

Rechner öffnen / frag uns um Rat mit Zielmodell, Nutzern und Randbedingungen.

Häufig gestellte Fragen

RTX 5090 oder RTX 4090 für lokale KI?+

Die RTX 5090 (32 GB) ermöglicht Modelle, die die 4090 (24 GB) nicht halten kann, etwa Llama 70B in Q3, und liefert ~40-70 % mehr Tokens/Sekunde dank GDDR7. Wenn Ihre Modelle bereits in 24 GB passen, bleibt eine gebrauchte RTX 4090 das beste Preis-Leistungs-Verhältnis.

Wie groß ist der VRAM-Unterschied zwischen RTX 4090 und RTX 5090?+

24 GB (GDDR6X) bei der 4090 gegenüber 32 GB (GDDR7) bei der 5090. Diese 8 GB mehr plus ~78 % höhere Speicherbandbreite sind die echten Vorteile der 5090 für die LLM-Inferenz (die memory-bound ist).

Kann man Llama 70B auf einer RTX 4090 betreiben?+

Gerade so, in Q3 und mit wenig Kontext — auf 24 GB ist es knapp. Die RTX 5090 (32 GB) hält Llama 3.3 70B in Q3 komfortabler. Für Q4/Q5 braucht man Multi-GPU.

Sind zwei RTX 4090 besser als eine RTX 5090?+

2x RTX 4090 = 48 GB kombiniert, mehr als die 5090, aber ohne NVLink läuft die Multi-GPU-Verbindung über PCIe, langsamer bei sehr großen Modellen. Für ein 70B ist eine einzelne RTX 5090 oft flüssiger als ein Paar 4090 über PCIe.

Ist die RTX 5090 ihren Preis für lokale KI wert?+

Ja, wenn Sie Llama 70B auf einer einzigen Karte, viel Kontext oder mehr Durchsatz für mehrere Nutzer wollen. Nein, wenn Ihre Modelle in 24 GB passen: Dann bleibt eine gebrauchte RTX 4090 unschlagbar beim Preis-Leistungs-Verhältnis.

GPUVergleichVRAM

X Reddit LinkedIn