GPU · 7 min di lettura

Quali LLM girano su una GPU economica (RTX 3060, 3070) nel 2026?

DO
Damien · LocalIA
Pubblicato 2026-06-07

Non serve una RTX 5090 per iniziare con l'IA locale. Le migliori GPU per budget (RTX 3060 12 GB, 4060 Ti 16 GB, 3090 24 GB), cosa entra per VRAM, e il re del VRAM per euro sull'usato.

LocalIA AI rig

Articolo tradotto. Questa versione e localizzata per evitare pagine internazionali con testo francese. Dati tecnici, prezzi e raccomandazioni restano invariati.

The sweet spot: RTX 3060 12 GB

At ~EUR 250-300 used, the RTX 3060 12 GB is the best entry point. Its 12 GB of VRAM (more than the 3070!) hold models up to 14B in Q4: Mistral 7B and Llama 8B in Q5/Q8 are comfortable, Qwen 2.5 14B fits in Q4, and Qwen 2.5-Coder 7B (a coding assistant) runs great at ~25-35 tok/s.

What fits by VRAM

8 GBRTX 3070 / 2070 / GTX 10707-8B in Q4 (short context)
12 GBRTX 3060 12 GB7-8B comfortable, 14B in Q4
16 GBRTX 4060 Ti 16 GB14B in Q5, 22B in Q4
24 GBRTX 3090 (used)32B in Q4, 70B in Q3 tight

The 8 GB cards (RTX 3070, 2070, GTX 1070)

8 GB is enough for the 7-8B segment, which already covers a lot: chat, summarization, and above all 7B coding assistants like Qwen 2.5-Coder 7B. Watch the context: on 8 GB, keep a reasonable window (4-8k tokens) so you do not saturate.

A GTX 1070/1080 (8 GB) runs a 7B in Q4, but without Tensor Cores or Flash Attention: count on ~10-15 tok/s, 2-3x slower than a recent RTX. Fine for testing, frustrating for daily use.

The budget king for bigger models: RTX 3090 24 GB

To go beyond 14B without blowing the budget, the used RTX 3090 (~EUR 600-700) is unbeatable: 24 GB of VRAM, as much as a new RTX 4090 for half the price. It swallows Qwen 2.5 32B in Q4 and gets close to Llama 70B in Q3. That is ~EUR 27 per GB of VRAM, the best ratio on the used market.

Two RTX 3090 = 48 GB (and they support NVLink, unlike the 4090s). It is the favourite home setup to run 70B in Q4 without moving to pro hardware, around EUR 1,300 for a used pair.

The verdict by budget

~EUR 250RTX 3060 12 GBStart out, 7-14B, 7B coding
~EUR 450RTX 4060 Ti 16 GB16 GB new, up to 22B
~EUR 650RTX 3090 24 GB32B, best EUR per GB
~EUR 1,3002x RTX 3090 NVLink70B in Q4 at home
Before buying a used card, check that your target model fits: the RTX 3060 12 GB page (and the other GPUs) lists compatible LLMs, and the LocalIA GPU to LLM calculator tells you the required VRAM and the quantization that fits, across 200+ cards. Free, no signup, independent resource, we sell nothing.

Apri il calcolatore / chiedici un consiglio con modello target, utenti e vincoli.

Domande frequenti

What is the cheapest GPU to run a local LLM?+
A used RTX 3060 12 GB (~EUR 250) is the best entry point: its 12 GB hold 7-14B models in Q4. To reach 32B, a used RTX 3090 24 GB (~EUR 650) is unbeatable on VRAM per euro.
Is an RTX 3060 12 GB enough for an LLM?+
Yes for 7-14B models. It runs Mistral 7B and Llama 8B in Q5/Q8 comfortably, Qwen 2.5 14B in Q4, and Qwen 2.5-Coder 7B at ~25-35 tok/s. Its 12 GB is actually more than the 8 GB of an RTX 3070.
Which used GPU has the best VRAM-per-euro ratio?+
The RTX 3090 24 GB (~EUR 600-700): as much VRAM as a new RTX 4090 for half the price, about EUR 27 per GB. It holds Qwen 2.5 32B in Q4. Two 3090s in NVLink (48 GB, ~EUR 1,300) even run 70B in Q4.
Can you run an LLM on an RTX 3070 8 GB?+
Yes, in the 7-8B segment in Q4 (chat, summarization, 7B coding assistant). Keep a reasonable context (4-8k tokens) so you do not saturate the 8 GB. For larger models you need 12 GB (3060) or 24 GB (3090).
Which budget GPU for a local coding assistant?+
An RTX 3060 12 GB is enough for Qwen 2.5-Coder 7B (~25-35 tok/s), the best 7B coding model. An RTX 3070 8 GB also works with a shorter context. For Qwen 2.5-Coder 32B, aim for an RTX 3090 24 GB.
GPUBudgetGuida