Bench notes from the lab.
Real benchmarks, tested configurations and field notes. No listicles, no SEO bait.

Building a local RAG in 2026: the Ollama + Qdrant + LlamaIndex stack
Four-brick architecture, tech choices (vLLM, Qdrant, LlamaIndex, Open WebUI), GPU sizing by concurrent users, and 24-month TCO vs GPT-4o.
Q4 vs Q5 vs Q8: which quantization for Llama 70B in 2026?
VRAM table per quant (Q3 to FP16), measured quality loss (Δ perplexity), GPU recommendations and estimated tok/s per setup. No bullshit.
Llama 4 locally in 2026: VRAM, GPUs and realistic alternatives
Llama 4 Scout, Maverick, Behemoth: what really fits at home in 2026. VRAM per version, minimum GPUs, and 5 alternatives (70-123B) that compete.
Mistral Large 123B locally: which rig, what real cost in 2026
Mistral Large 123B open-weight at home: VRAM by quant, minimum rig (2x A6000 NVLink), ROI vs Mistral API by monthly volume, and when to prefer Llama 3.3 70B instead.
vLLM vs Ollama in production: the 2026 benchmark (single user, batching, multi-user)
Real benchmark of the two inference runtimes on RTX 5090 and 2x RTX 5090 NVLink. Single user, 4 concurrent users, 10 users under load: who wins when, and why continuous batching changes everything.
Sovereign RAG with Qwen 3 30B MoE: the complete 2026 stack
Why Qwen 3 30B-A3B (MoE, 3B active params/token) is the 2026 sweet spot for a sovereign team RAG. Stack: vLLM + Qdrant + nomic-embed + LlamaIndex, on a Pro rig (EUR 11,990). All open-weight, fully self-hosted.
How much does an AI server for an SME cost in 2026?
A clear breakdown of the real cost of a local AI rig in 2026: hardware, software, electricity and support, with three priced tiers and a cloud API comparison.
Cloud vs on-prem AI: break-even can arrive in 9 months
An honest comparison between OpenAI / Anthropic APIs and a local AI rig, with three concrete TCO scenarios over 24 months.
RTX 5090 vs Mac Studio M3 Ultra for local LLMs
Two philosophies and two winners depending on the use case: dedicated VRAM vs unified memory, throughput, multi-user serving and EUR per GB.
Which GPU do you need to run Llama 3.3 70B locally in 2026?
VRAM by quantization, compatible GPUs, RTX 5090 vs A6000 vs H100, and the cost/performance trade-off against OpenAI APIs.