¿Se puede ejecutar un LLM en local en un Mac Mini M4? (2026)
El Mac Mini M4 ejecuta LLM locales sorprendentemente bien gracias a la memoria unificada (hasta 64 GB como VRAM). Qué modelos caben por configuración, los tok/s reales y Mac frente a una GPU dedicada.

Articulo traducido. Esta version esta localizada para evitar mezclar interfaces internacionales con texto frances. Los datos tecnicos, importes y recomendaciones se mantienen iguales.
Unified memory = your VRAM
On a Mac, the CPU and GPU share the same memory (unified architecture). In practice, all the RAM can serve as VRAM, macOS reserves part of it, so count on ~70% actually available for the model. That is Apple's big advantage over a dedicated graphics card capped at 24 or 32 GB.
| Mac Mini M4 (base) | 16 / 24 / 32 GB | ~120 GB/s | ~EUR 700-1,100 |
| Mac Mini M4 Pro | 24 / 48 / 64 GB | ~273 GB/s | ~EUR 1,500-2,400 |
The catch: memory bandwidth
The M4 Pro has 2.3x the memory bandwidth of the base M4 (273 vs 120 GB/s). Because LLM inference is memory-bound, that number, more than the RAM, decides your tokens per second.
Which LLMs fit by RAM
| 16 GB | 7-8B in Q4/Q5 (Llama 8B, Mistral 7B) | 14B in Q4 |
| 24 GB | 14B in Q5, 32B in Q3 | 32B in Q4 tight |
| 48 GB | 32B in Q5/Q6 (Qwen 2.5 32B) | 70B in Q3 |
| 64 GB | 70B in Q4 (Llama 3.3 70B ~40 GB) | 70B in Q5 just |
Real throughput (tokens/second)
Fine for solo chat or a background coding assistant. Not for serving several users at once: the Mac does not do efficient batching (Metal is less mature than CUDA here).
| Mac Mini M4 (120 GB/s) | 8B Q4: ~18-24 tok/s | 32B Q5: too slow / won't fit |
| Mac Mini M4 Pro (273 GB/s) | 8B Q4: ~40-50 tok/s | 32B Q5: ~10-14 tok/s |
Mac Mini M4 or a dedicated GPU?
- Max VRAM: 64 GB unified (Mac) vs 24-32 GB (RTX 4090/5090).
- Throughput: dedicated GPU is clearly faster (GDDR7).
- Multi-user: dedicated GPU wins (vLLM batching); Mac is weak.
- Power/noise: Mac ~30-50 W and silent; RTX 350-575 W and loud.
- Entry price: ~EUR 700 (Mac base) vs ~EUR 1,100 (used 4090).
The Mac Mini M4 is excellent for solo, silent, low-power use up to 32B models (M4 Pro 48 GB), even 70B in Q4 (64 GB). For high throughput or several users, a dedicated GPU still wins. Check your exact setup in the LocalIA GPU to LLM calculator: it tells you what fits and at which quantization. Free, no signup, independent resource, we sell nothing.
Abre la calculadora / escríbenos para un consejo con tu modelo objetivo, usuarios y restricciones.