NVIDIAWorkstationRTX A (Ampere)

RTX A4000 for local AI

RTX A4000 provides 16 GB of VRAM for local AI. In the LocalIA catalog, 170 out of 242 models run comfortably on a single card.

View all compatible models →Rig around the A4000 ↗

VRAM

16GB

Models that run comfortably

170 models

These models fit in 16 GB with room for context and stable inference.

01Reka Flash 3 21Breka13.2 GBcomfortableQ4 · / 16 GB

02InternLM 2.5 20Binternlm12.6 GBcomfortableQ4 · / 16 GB

03DeepSeek V2 Litedeepseek12.3 GBcomfortableQ5 · / 16 GB

04DeepSeek Coder V2 Litedeepseek12.3 GBcomfortableQ5 · / 16 GB

05StarCoder 2 15Bstarcoder11.5 GBcomfortableQ5 · / 16 GB

06★Phi-4 Reasoning Vision 15Bphi11.5 GBcomfortableQ5 · / 16 GB

07★Qwen 2.5 14Bqwen10.8 GBcomfortableQ5 · / 16 GB

08Qwen 2.5 Coder 14Bqwen10.8 GBcomfortableQ5 · / 16 GB

09★Qwen 3 14Bqwen10.8 GBcomfortableQ5 · / 16 GB

10★DeepSeek R1 Distill 14Bdeepseek10.8 GBcomfortableQ5 · / 16 GB

11Phi-3 Medium 14Bphi10.8 GBcomfortableQ5 · / 16 GB

12★Phi-4 14Bphi10.8 GBcomfortableQ5 · / 16 GB

13★GLM-4.5 Airglm10.8 GBcomfortableQ5 · / 16 GB

14★Qwen2.5 14B Instructqwen10.8 GBcomfortableQ5 · / 16 GB

15★Qwen3 14Bqwen10.8 GBcomfortableQ5 · / 16 GB

16★Qwen2.5 Coder 14B Instructqwen10.8 GBcomfortableQ5 · / 16 GB

17★DeepSeek R1 Distill Qwen 14Bqwen10.8 GBcomfortableQ5 · / 16 GB

18Llama 2 13Bllama10.0 GBcomfortableQ5 · / 16 GB

19CodeLlama 13Bcodellama10.0 GBcomfortableQ5 · / 16 GB

20OLMo 2 13Bolmo10.0 GBcomfortableQ5 · / 16 GB

21Vicuna 13Bvicuna10.0 GBcomfortableQ5 · / 16 GB

22★Mistral Nemo 12Bmistral13.4 GBcomfortableQ8 · / 16 GB

23★Gemma 3 12Bgemma13.4 GBcomfortableQ8 · / 16 GB

24StableLM 2 12Bstable13.4 GBcomfortableQ8 · / 16 GB

25Solar 10.7Bsolar12.0 GBcomfortableQ8 · / 16 GB

26Falcon 3 10Bfalcon11.2 GBcomfortableQ8 · / 16 GB

27★Gemma 2 9Bgemma10.1 GBcomfortableQ8 · / 16 GB

28Yi 1.5 9Byi10.1 GBcomfortableQ8 · / 16 GB

29★Qwen 3.5 9Bqwen10.1 GBcomfortableQ8 · / 16 GB

30★GLM-4 9Bglm10.1 GBcomfortableQ8 · / 16 GB

Tight models

6 models

These models barely fit. They can run, but context and speed will be limited.

01★Mistral Small 3 24Bmistral15.1 GBtightQ4 · / 16 GB

02★Mistral Small 3.1 24Bmistral15.1 GBtightQ4 · / 16 GB

03★Mistral Small 3.2 24Bmistral15.1 GBtightQ4 · / 16 GB

04★Devstral Small 2 24Bdevstral15.1 GBtightQ4 · / 16 GB

05Mistral Small 22Bmistral13.8 GBtightQ4 · / 16 GB

06★Codestral 22Bcodestral13.8 GBtightQ4 · / 16 GB

Unlocked in a 2x rig

32 GB

With two cards in parallel (32 GB total), larger models become reachable.

01★Mixtral 8x7Bmistral29.5 GBtightQ4 · / 32 GB

02Falcon 40Bfalcon25.1 GBcomfortableQ4 · / 32 GB

03Command R 35Bcommand26.9 GBcomfortableQ5 · / 32 GB

04Aya 23 35Baya26.9 GBcomfortableQ5 · / 32 GB

05CodeLlama 34Bcodellama26.1 GBcomfortableQ5 · / 32 GB

06Yi 1.5 34Byi26.1 GBcomfortableQ5 · / 32 GB

07★dolphin 2.9.1 yi 1.5 34byi26.1 GBcomfortableQ5 · / 32 GB

08★Qwen 2.5 32Bqwen24.6 GBcomfortableQ5 · / 32 GB

09★Qwen 2.5 Coder 32Bqwen24.6 GBcomfortableQ5 · / 32 GB

10★Qwen 3 32Bqwen24.6 GBcomfortableQ5 · / 32 GB

11★QwQ 32Bqwq24.6 GBcomfortableQ5 · / 32 GB

12★DeepSeek R1 Distill 32Bdeepseek24.6 GBcomfortableQ5 · / 32 GB

13Qwen 2.5 VL 32Bqwen24.6 GBcomfortableQ5 · / 32 GB

14★Granite 4 H-Small 32B-A9Bgranite24.6 GBcomfortableQ5 · / 32 GB

15GLM-4.6glm24.6 GBcomfortableQ5 · / 32 GB

Unlocked in a 4x rig

64 GB

Server-style configuration (64 GB total) for the largest open-weight models.

01★Qwen3 Next 80B A3B Instructqwen50.3 GBcomfortableQ4 · / 64 GB

02★Qwen 2.5 72Bqwen45.3 GBcomfortableQ4 · / 64 GB

03Qwen 2.5 VL 72Bqwen45.3 GBcomfortableQ4 · / 64 GB

04★Qwen2.5 72B Instructqwen45.3 GBcomfortableQ4 · / 64 GB

05Llama 2 70Bllama53.8 GBcomfortableQ5 · / 64 GB

06Llama 3 70Bllama53.8 GBcomfortableQ5 · / 64 GB

07Llama 3.1 70Bllama53.8 GBcomfortableQ5 · / 64 GB

08★Llama 3.3 70Bllama53.8 GBcomfortableQ5 · / 64 GB

09CodeLlama 70Bcodellama53.8 GBcomfortableQ5 · / 64 GB

10★DeepSeek R1 Distill 70Bdeepseek53.8 GBcomfortableQ5 · / 64 GB

Similar GPUs

VRAM estimates updated 2026-05-12.