Bench notes from the lab.
Real benchmarks, tested configurations and field notes. No listicles, no SEO bait.

Which LLMs run on a budget GPU (RTX 3060, 3070) in 2026?
You don't need an RTX 5090 to start with local AI. The best GPUs by budget (RTX 3060 12 GB, 4060 Ti 16 GB, 3090 24 GB), what fits by VRAM, and the king of VRAM per euro on the used market.
Can you run a local LLM on a Mac Mini M4? (2026)
The Mac Mini M4 runs local LLMs surprisingly well thanks to unified memory (up to 64 GB acting as VRAM). Which models fit per configuration, real tokens/second, and Mac vs a dedicated GPU.
RTX 5090 vs RTX 4090 for local AI: which to choose in 2026?
The calm matchup from the angle that matters: which models run, at what throughput, for what price. VRAM (32 vs 24 GB), GDDR7 bandwidth, a models table, and the verdict by profile.
How much VRAM to run a local LLM? (formula + 2026 table)
The exact method to estimate an LLM's VRAM: model weights x bytes per parameter, KV cache, overhead. A ready-to-use table (7B to 123B x Q4/Q5/Q8) and the minimum card per model.
Building a local RAG in 2026: the Ollama + Qdrant + LlamaIndex stack
Four-brick architecture, tech choices (vLLM, Qdrant, LlamaIndex, Open WebUI), GPU sizing by concurrent users, and 24-month TCO vs GPT-4o.
Q4 vs Q5 vs Q8: which quantization for Llama 70B in 2026?
VRAM table per quant (Q3 to FP16), measured quality loss (Δ perplexity), GPU recommendations and estimated tok/s per setup. No bullshit.
Llama 4 locally in 2026: VRAM, GPUs and realistic alternatives
Llama 4 Scout, Maverick, Behemoth: what really fits at home in 2026. VRAM per version, minimum GPUs, and 5 alternatives (70-123B) that compete.
Mistral Large 123B locally: which rig, what real cost in 2026
Mistral Large 123B open-weight at home: VRAM by quant, minimum rig (2x A6000 NVLink), ROI vs Mistral API by monthly volume, and when to prefer Llama 3.3 70B instead.
vLLM vs Ollama in production: the 2026 benchmark (single user, batching, multi-user)
Real benchmark of the two inference runtimes on RTX 5090 and 2x RTX 5090 NVLink. Single user, 4 concurrent users, 10 users under load: who wins when, and why continuous batching changes everything.
Sovereign RAG with Qwen 3 30B MoE: the complete 2026 stack
Why Qwen 3 30B-A3B (MoE, 3B active params/token) is the 2026 sweet spot for a sovereign team RAG. Stack: vLLM + Qdrant + nomic-embed + LlamaIndex, on a Pro rig (EUR 11,990). All open-weight, fully self-hosted.
How much does an AI server for an SME cost in 2026?
A clear breakdown of the real cost of a local AI rig in 2026: hardware, software, electricity and support, with three priced tiers and a cloud API comparison.
Cloud vs on-prem AI: break-even can arrive in 9 months
An honest comparison between OpenAI / Anthropic APIs and a local AI rig, with three concrete TCO scenarios over 24 months.
RTX 5090 vs Mac Studio M3 Ultra for local LLMs
Two philosophies and two winners depending on the use case: dedicated VRAM vs unified memory, throughput, multi-user serving and EUR per GB.
Which GPU do you need to run Llama 3.3 70B locally in 2026?
VRAM by quantization, compatible GPUs, RTX 5090 vs A6000 vs H100, and the cost/performance trade-off against OpenAI APIs.