VRAM, RAM, and storage requirements for all Gemma 4 model variants across runtimes.

Hardware Requirements

Gemma 4 runs on a wide range of hardware. Use this page to find the right model and quantization for your setup.

Quick reference

Model	Full precision	Q8	Q4	CPU-only
E2B	4 GB	2 GB	1.4 GB	✓ (slow)
E4B	8 GB	4.5 GB	3.2 GB	✓
26B A4B	52 GB	28 GB	16.4 GB	Not recommended
31B	62 GB	32 GB	24 GB	Not recommended

GPU recommendations

Apple Silicon (M1–M4)

Unified memory. E4B runs well at 8 GB. 26B A4B needs 24 GB+ (M3 Pro/M4 Max). Use MLX for best performance.

NVIDIA RTX 3000/4000

RTX 3060 (12 GB) handles E4B comfortably. RTX 4090 (24 GB) can run 31B at Q4. Use CUDA backend.

NVIDIA RTX 3000/4000 8 GB

Use E4B at Q4 or E2B at full precision. Context window limited to ~8K for stable inference.

CPU only

E2B and E4B are usable on modern CPUs via llama.cpp. Expect 2–8 tokens/sec on 8-core machines.

Storage

Each model requires disk space for weights:

E2B: ~1.5 GB (Q4) — ~4 GB (FP16)
E4B: ~3 GB (Q4) — ~8 GB (FP16)
26B A4B: ~16 GB (Q4) — ~52 GB (FP16)
31B: ~24 GB (Q4) — ~62 GB (FP16)

MoE models (26B A4B) load 4B of active parameters during inference, but the full weight set must fit in storage. Disk space requirements are based on the full parameter count.

Context window and VRAM

Longer context windows consume more VRAM during inference. The minimum VRAM figures above assume short contexts (≤2K tokens). For full context:

Model	Max context	Additional VRAM
E4B (32K)	32,768 tokens	+2–4 GB
26B A4B (128K)	131,072 tokens	+8–12 GB
31B (256K)	262,144 tokens	+16–24 GB

Platform notes

macOS: Apple Unified Memory is shared between CPU and GPU. A 16 GB M-series Mac can run E4B comfortably and 26B A4B in a pinch.

Windows: Use WSL2 for Ollama and llama.cpp. Native Windows support is available for LM Studio and llama.cpp builds.

Linux: Best performance across all runtimes. NVIDIA CUDA recommended for models above E4B.

Hardware Requirements

VRAM, RAM, and storage requirements for all Gemma 4 model variants across runtimes.

Hardware Requirements

Gemma 4 runs on a wide range of hardware. Use this page to find the right model and quantization for your setup.

Quick reference

Model	Full precision	Q8	Q4	CPU-only
E2B	4 GB	2 GB	1.4 GB	✓ (slow)
E4B	8 GB	4.5 GB	3.2 GB	✓
26B A4B	52 GB	28 GB	16.4 GB	Not recommended
31B	62 GB	32 GB	24 GB	Not recommended

GPU recommendations

Apple Silicon (M1–M4)

Unified memory. E4B runs well at 8 GB. 26B A4B needs 24 GB+ (M3 Pro/M4 Max). Use MLX for best performance.

NVIDIA RTX 3000/4000

RTX 3060 (12 GB) handles E4B comfortably. RTX 4090 (24 GB) can run 31B at Q4. Use CUDA backend.

NVIDIA RTX 3000/4000 8 GB

Use E4B at Q4 or E2B at full precision. Context window limited to ~8K for stable inference.

CPU only

E2B and E4B are usable on modern CPUs via llama.cpp. Expect 2–8 tokens/sec on 8-core machines.

Storage

Each model requires disk space for weights:

E2B: ~1.5 GB (Q4) — ~4 GB (FP16)
E4B: ~3 GB (Q4) — ~8 GB (FP16)
26B A4B: ~16 GB (Q4) — ~52 GB (FP16)
31B: ~24 GB (Q4) — ~62 GB (FP16)

MoE models (26B A4B) load 4B of active parameters during inference, but the full weight set must fit in storage. Disk space requirements are based on the full parameter count.

Context window and VRAM

Longer context windows consume more VRAM during inference. The minimum VRAM figures above assume short contexts (≤2K tokens). For full context:

Model	Max context	Additional VRAM
E4B (32K)	32,768 tokens	+2–4 GB
26B A4B (128K)	131,072 tokens	+8–12 GB
31B (256K)	262,144 tokens	+16–24 GB

Platform notes

macOS: Apple Unified Memory is shared between CPU and GPU. A 16 GB M-series Mac can run E4B comfortably and 26B A4B in a pinch.

Windows: Use WSL2 for Ollama and llama.cpp. Native Windows support is available for LM Studio and llama.cpp builds.

Linux: Best performance across all runtimes. NVIDIA CUDA recommended for models above E4B.

Hardware Requirements

Hardware Requirements

Quick reference

GPU recommendations

Apple Silicon (M1–M4)

NVIDIA RTX 3000/4000

NVIDIA RTX 3000/4000 8 GB

CPU only

Storage

Context window and VRAM

Platform notes

目录

Hardware Requirements

Hardware Requirements

Quick reference

GPU recommendations

Apple Silicon (M1–M4)

NVIDIA RTX 3000/4000

NVIDIA RTX 3000/4000 8 GB

CPU only

Storage

Context window and VRAM

Platform notes

目录