Apple · 7 Min. Lesezeit

Kann man ein LLM lokal auf einem Mac Mini M4 betreiben? (2026)

Damien · LocalIA

Veröffentlicht 2026-06-06

Der Mac Mini M4 betreibt lokale LLMs erstaunlich gut dank Unified Memory (bis 64 GB als VRAM). Welche Modelle je Konfiguration passen, reale tok/s und Mac vs dedizierte GPU.

Uebersetzter Artikel. Diese Version ist lokalisiert, damit internationale Seiten keinen franzoesischen Artikeltext anzeigen. Technische Daten, Preise und Empfehlungen bleiben gleich.

Unified memory = your VRAM

On a Mac, the CPU and GPU share the same memory (unified architecture). In practice, all the RAM can serve as VRAM, macOS reserves part of it, so count on ~70% actually available for the model. That is Apple's big advantage over a dedicated graphics card capped at 24 or 32 GB.

Mac Mini M4 (base)	16 / 24 / 32 GB	~120 GB/s	~EUR 700-1,100
Mac Mini M4 Pro	24 / 48 / 64 GB	~273 GB/s	~EUR 1,500-2,400

The catch: memory bandwidth

The M4 Pro has 2.3x the memory bandwidth of the base M4 (273 vs 120 GB/s). Because LLM inference is memory-bound, that number, more than the RAM, decides your tokens per second.

Which LLMs fit by RAM

16 GB	7-8B in Q4/Q5 (Llama 8B, Mistral 7B)	14B in Q4
24 GB	14B in Q5, 32B in Q3	32B in Q4 tight
48 GB	32B in Q5/Q6 (Qwen 2.5 32B)	70B in Q3
64 GB	70B in Q4 (Llama 3.3 70B ~40 GB)	70B in Q5 just

Real throughput (tokens/second)

Fine for solo chat or a background coding assistant. Not for serving several users at once: the Mac does not do efficient batching (Metal is less mature than CUDA here).

Mac Mini M4 (120 GB/s)	8B Q4: ~18-24 tok/s	32B Q5: too slow / won't fit
Mac Mini M4 Pro (273 GB/s)	8B Q4: ~40-50 tok/s	32B Q5: ~10-14 tok/s

Mac Mini M4 or a dedicated GPU?

Max VRAM: 64 GB unified (Mac) vs 24-32 GB (RTX 4090/5090).
Throughput: dedicated GPU is clearly faster (GDDR7).
Multi-user: dedicated GPU wins (vLLM batching); Mac is weak.
Power/noise: Mac ~30-50 W and silent; RTX 350-575 W and loud.
Entry price: ~EUR 700 (Mac base) vs ~EUR 1,100 (used 4090).

The Mac Mini M4 is excellent for solo, silent, low-power use up to 32B models (M4 Pro 48 GB), even 70B in Q4 (64 GB). For high throughput or several users, a dedicated GPU still wins. Check your exact setup in the LocalIA GPU to LLM calculator: it tells you what fits and at which quantization. Free, no signup, independent resource, we sell nothing.

Rechner öffnen / frag uns um Rat mit Zielmodell, Nutzern und Randbedingungen.

Häufig gestellte Fragen

Kann man ein lokales LLM auf einem Mac Mini M4 betreiben?+

Ja. Dank Unified Memory als VRAM (bis 64 GB auf dem M4 Pro) lädt der Mac Mini M4 je nach Konfiguration Modelle von 7B bis 70B. Die eigentliche Grenze ist die Speicherbandbreite, die die Tokens pro Sekunde begrenzt.

Wie viel RAM braucht ein Mac Mini M4 für ein LLM?+

16 GB reichen für 7-8B in Q4/Q5, 24 GB für 14B, 48 GB für ein echtes 32B (Qwen 2.5 32B) in Q5 und 64 GB für ein Llama 70B in Q4 (~40 GB). Rechnen Sie mit ~70 % des RAM, der tatsächlich für das Modell verfügbar ist.

Welcher Mac Mini M4 für Qwen 2.5 32B?+

Der Mac Mini M4 Pro 48 GB ist der Sweet Spot: Er hält Qwen 2.5 32B in Q5_K_M mit Spielraum für den Kontext, bei ~10-14 Tokens/Sekunde dank ~273 GB/s Bandbreite.

Ist der Mac Mini M4 langsamer als eine dedizierte GPU für KI?+

Ja beim reinen Durchsatz: Eine RTX 4090/5090 (GDDR7) liefert deutlich mehr Tokens pro Sekunde und beherrscht Multi-User-Serving (vLLM-Batching). Der Mac gewinnt bei maximaler VRAM (64 GB unified), Lautlosigkeit und Stromverbrauch (~30-50 W).

Mac Mini M4 oder M4 Pro für LLMs?+

Der M4 Pro, wenn das Budget es zulässt: Er hat die 2,3-fache Speicherbandbreite des Basis-M4 (273 vs 120 GB/s) und skaliert auf 64 GB. Da die Inferenz memory-bound ist, entscheidet das über die Geschwindigkeit.

AppleMacVRAM

X Reddit LinkedIn