NVIDIAWorkstationRTX A (Ampere)

RTX A4500 for local AI

RTX A4500 provides 20 GB of VRAM for local AI. In the LocalIA catalog, 179 out of 242 models run comfortably on a single card.

VRAM
20GB
Category
Workstation
Series
RTX A (Ampere)
Vendor
NVIDIA

Models that run comfortably

These models fit in 20 GB with room for context and stable inference.

Gemma 2 27Bgemma17.0 GBcomfortableQ4 · / 20 GB
Gemma 3 27Bgemma17.0 GBcomfortableQ4 · / 20 GB
Gemma 4 26B A4Bgemma16.3 GBcomfortableQ4 · / 20 GB
Mistral Small 3 24Bmistral15.1 GBcomfortableQ4 · / 20 GB
Mistral Small 3.1 24Bmistral15.1 GBcomfortableQ4 · / 20 GB
Mistral Small 3.2 24Bmistral15.1 GBcomfortableQ4 · / 20 GB
Devstral Small 2 24Bdevstral15.1 GBcomfortableQ4 · / 20 GB
Mistral Small 22Bmistral16.9 GBcomfortableQ5 · / 20 GB
Codestral 22Bcodestral16.9 GBcomfortableQ5 · / 20 GB
Reka Flash 3 21Breka16.1 GBcomfortableQ5 · / 20 GB
InternLM 2.5 20Binternlm15.4 GBcomfortableQ5 · / 20 GB
DeepSeek V2 Litedeepseek12.3 GBcomfortableQ5 · / 20 GB
DeepSeek Coder V2 Litedeepseek12.3 GBcomfortableQ5 · / 20 GB
StarCoder 2 15Bstarcoder16.8 GBcomfortableQ8 · / 20 GB
Phi-4 Reasoning Vision 15Bphi16.8 GBcomfortableQ8 · / 20 GB
Qwen 2.5 14Bqwen15.6 GBcomfortableQ8 · / 20 GB
Qwen 2.5 Coder 14Bqwen15.6 GBcomfortableQ8 · / 20 GB
Qwen 3 14Bqwen15.6 GBcomfortableQ8 · / 20 GB
DeepSeek R1 Distill 14Bdeepseek15.6 GBcomfortableQ8 · / 20 GB
Phi-3 Medium 14Bphi15.6 GBcomfortableQ8 · / 20 GB
Phi-4 14Bphi15.6 GBcomfortableQ8 · / 20 GB
GLM-4.5 Airglm15.6 GBcomfortableQ8 · / 20 GB
Qwen2.5 14B Instructqwen15.6 GBcomfortableQ8 · / 20 GB
Qwen3 14Bqwen15.6 GBcomfortableQ8 · / 20 GB
Qwen2.5 Coder 14B Instructqwen15.6 GBcomfortableQ8 · / 20 GB
DeepSeek R1 Distill Qwen 14Bqwen15.6 GBcomfortableQ8 · / 20 GB
Llama 2 13Bllama14.5 GBcomfortableQ8 · / 20 GB
CodeLlama 13Bcodellama14.5 GBcomfortableQ8 · / 20 GB
OLMo 2 13Bolmo14.5 GBcomfortableQ8 · / 20 GB
Vicuna 13Bvicuna14.5 GBcomfortableQ8 · / 20 GB

Tight models

These models barely fit. They can run, but context and speed will be limited.

Gemma 4 31Bgemma19.5 GBtightQ4 · / 20 GB
Qwen 3 30B A3Bqwen18.9 GBtightQ4 · / 20 GB
MPT 30Bmpt18.9 GBtightQ4 · / 20 GB
Qwen3 Coder 30B A3B Instructqwen18.9 GBtightQ4 · / 20 GB
Qwen3 30B A3Bqwen18.9 GBtightQ4 · / 20 GB
Qwen3 30B A3B Instruct 2507qwen18.9 GBtightQ4 · / 20 GB

Unlocked in a 2x rig

With two cards in parallel (40 GB total), larger models become reachable.

Mixtral 8x7Bmistral29.5 GBcomfortableQ4 · / 40 GB
Falcon 40Bfalcon30.7 GBcomfortableQ5 · / 40 GB
Command R 35Bcommand26.9 GBcomfortableQ5 · / 40 GB
Aya 23 35Baya26.9 GBcomfortableQ5 · / 40 GB
CodeLlama 34Bcodellama26.1 GBcomfortableQ5 · / 40 GB
Yi 1.5 34Byi26.1 GBcomfortableQ5 · / 40 GB
dolphin 2.9.1 yi 1.5 34byi26.1 GBcomfortableQ5 · / 40 GB
Qwen 2.5 32Bqwen24.6 GBcomfortableQ5 · / 40 GB
Qwen 2.5 Coder 32Bqwen24.6 GBcomfortableQ5 · / 40 GB
Qwen 3 32Bqwen24.6 GBcomfortableQ5 · / 40 GB
QwQ 32Bqwq24.6 GBcomfortableQ5 · / 40 GB
DeepSeek R1 Distill 32Bdeepseek24.6 GBcomfortableQ5 · / 40 GB
Qwen 2.5 VL 32Bqwen24.6 GBcomfortableQ5 · / 40 GB
Granite 4 H-Small 32B-A9Bgranite24.6 GBcomfortableQ5 · / 40 GB
GLM-4.6glm24.6 GBcomfortableQ5 · / 40 GB

Unlocked in a 4x rig

Server-style configuration (80 GB total) for the largest open-weight models.

Mistral Large 123Bmistral77.3 GBtightQ4 · / 80 GB
Llama 4 Scout 17Bx16llama68.5 GBtightQ4 · / 80 GB
Command R+ 104Bcommand65.4 GBcomfortableQ4 · / 80 GB
Qwen3 Next 80B A3B Instructqwen61.5 GBcomfortableQ5 · / 80 GB
Qwen 2.5 72Bqwen55.3 GBcomfortableQ5 · / 80 GB
Qwen 2.5 VL 72Bqwen55.3 GBcomfortableQ5 · / 80 GB
Qwen2.5 72B Instructqwen55.3 GBcomfortableQ5 · / 80 GB
Llama 2 70Bllama53.8 GBcomfortableQ5 · / 80 GB
Llama 3 70Bllama53.8 GBcomfortableQ5 · / 80 GB

Similar GPUs

VRAM estimates updated 2026-05-12.