Gemma 4 Memory Requirements — VRAM Guide for All Models

Exact VRAM requirements for every Gemma 4 model at BF16 full precision and Q4 quantization, with hardware recommendations for common GPUs and Apple Silicon Macs.

Overview

Running Gemma 4 locally requires understanding memory requirements before downloading a model. All Gemma 4 models support quantization (4-bit and 8-bit), which dramatically reduces VRAM usage at a modest quality cost. This guide shows exact memory requirements for each Gemma 4 model at both full precision (BF16) and the most popular quantization level (Q4).

Gemma 4 memory requirements vary significantly between models. The smallest Gemma 4 variant (E2B) runs in just 2GB of VRAM with Q4 quantization — fitting on integrated graphics and older mobile GPUs. The largest Gemma 4 model (31B) requires 64GB for full precision, but drops to a more accessible 18GB with Q4, opening it up to RTX 3090 owners.

BF16 Full Precision Memory Requirements

BF16 (Brain Float 16) is the native precision for Gemma 4 models. Use BF16 when you need maximum output quality and have sufficient VRAM available.

Model	Parameters	Min VRAM (BF16)
Gemma 4 E2B	2.1B	5 GB
Gemma 4 E4B	4.4B	10 GB
Gemma 4 26B A4B	26.1B (MoE)	28 GB
Gemma 4 31B	31B	64 GB

Q4 Quantized Memory Requirements

Q4 quantization (4-bit weights) is the recommended option for most local Gemma 4 deployments. It cuts VRAM usage by approximately 75% compared to BF16, with only a 3–5% reduction in benchmark quality.

Model	Min VRAM (Q4)	Fits On
Gemma 4 E2B Q4	2 GB	Any modern GPU
Gemma 4 E4B Q4	4 GB	GTX 1660, RTX 3060
Gemma 4 26B A4B Q4	14 GB	RTX 3090, RTX 4090
Gemma 4 31B Q4	18 GB	RTX 3090 24GB+

lightbulb

Sweet spot for local development

For most local development workflows, Gemma 4 E4B Q4 at 4GB VRAM is the sweet spot. It fits on any modern GPU and retains 95%+ of the full-precision quality. If you have an RTX 3060 or newer, start here.

Recommended Hardware by Gemma 4 Model

Not sure which Gemma 4 model fits your machine? Use this table to find the best match for your hardware. All recommendations use Q4 quantization unless otherwise noted.

Hardware	Recommended Gemma 4 Model
Apple MacBook Air M2 (8GB)	Gemma 4 E2B Q4
Apple MacBook Pro M3 (16GB)	Gemma 4 E4B Q4
Apple MacBook Pro M3 Max (48GB)	Gemma 4 26B A4B Q4
NVIDIA RTX 3070 (8GB)	Gemma 4 E4B Q4
NVIDIA RTX 3090 (24GB)	Gemma 4 26B A4B Q4
NVIDIA A100 (80GB)	Gemma 4 31B BF16

What is Quantization?

Quantization reduces the bit-width of model weights — from 16-bit (BF16) to 8-bit (Q8) or 4-bit (Q4). For Gemma 4 models, Q4 quantization typically reduces quality by 3–5% on standard benchmarks while cutting memory usage by approximately 75%. Q8 quantization offers a middle ground: roughly 50% memory reduction with less than 1% quality loss.

When you run a Gemma 4 model through Ollama, it automatically selects Q4_K_M quantization — a variant of Q4 that applies different quantization levels to different weight layers, prioritizing accuracy for the most sensitive parts of the Gemma 4 architecture. This is the recommended quantization format for all Gemma 4 models running on consumer hardware.

For Gemma 4 models running in production on server-grade hardware, use BF16 or FP8 precision (where supported) to maximize output quality. FP8 is supported on H100 GPUs and cuts memory roughly in half compared to BF16, making Gemma 4 31B fit in 32GB of VRAM at near-full quality.

warning

MoE memory requirements explained

The Gemma 4 26B A4B model uses a Mixture-of-Experts (MoE) architecture. While it has 26.1B total parameters, only approximately 4B are active per token during inference. However, all weights must still be loaded into VRAM — the active parameter count does not reduce memory requirements. Plan for 28GB BF16 or 14GB Q4 when deploying Gemma 4 26B A4B.

Memory Requirements for Apple Silicon

Apple Silicon Macs use unified memory — the same memory pool serves both CPU and GPU workloads. This is distinct from discrete GPU VRAM. When running Gemma 4 on Apple Silicon, count the full system memory, not just the GPU allocation. A MacBook Pro with 16GB unified memory has roughly 12–13GB available for model weights after macOS overhead, making Gemma 4 E4B Q4 (4GB) comfortable and Gemma 4 E4B BF16 (10GB) feasible.

Apple Silicon benefits from the Metal Performance Shaders (MPS) backend in llama.cpp and Ollama. Gemma 4 inference on M-series chips is significantly faster than CPU-only inference on x86 machines with the same memory, making Apple Silicon an excellent platform for running Gemma 4 E4B and Gemma 4 E2B locally.

Gemma 4 E4BThe daily driver — 8GB VRAM minimum Gemma 4 E2BUltra-light — runs in 2GB with Q4 Run Gemma 4 with OllamaAutomatic quantization selection Gemma 4 Models ComparisonFull feature comparison of all Gemma 4 variants

Gemma 4 Memory Requirements — VRAM Guide for All Models

Exact VRAM requirements for every Gemma 4 model at BF16 full precision and Q4 quantization, with hardware recommendations for common GPUs and Apple Silicon Macs.

Overview

BF16 Full Precision Memory Requirements

BF16 (Brain Float 16) is the native precision for Gemma 4 models. Use BF16 when you need maximum output quality and have sufficient VRAM available.

Model	Parameters	Min VRAM (BF16)
Gemma 4 E2B	2.1B	5 GB
Gemma 4 E4B	4.4B	10 GB
Gemma 4 26B A4B	26.1B (MoE)	28 GB
Gemma 4 31B	31B	64 GB

Q4 Quantized Memory Requirements

Model	Min VRAM (Q4)	Fits On
Gemma 4 E2B Q4	2 GB	Any modern GPU
Gemma 4 E4B Q4	4 GB	GTX 1660, RTX 3060
Gemma 4 26B A4B Q4	14 GB	RTX 3090, RTX 4090
Gemma 4 31B Q4	18 GB	RTX 3090 24GB+

lightbulb

Sweet spot for local development

Recommended Hardware by Gemma 4 Model

Not sure which Gemma 4 model fits your machine? Use this table to find the best match for your hardware. All recommendations use Q4 quantization unless otherwise noted.

Hardware	Recommended Gemma 4 Model
Apple MacBook Air M2 (8GB)	Gemma 4 E2B Q4
Apple MacBook Pro M3 (16GB)	Gemma 4 E4B Q4
Apple MacBook Pro M3 Max (48GB)	Gemma 4 26B A4B Q4
NVIDIA RTX 3070 (8GB)	Gemma 4 E4B Q4
NVIDIA RTX 3090 (24GB)	Gemma 4 26B A4B Q4
NVIDIA A100 (80GB)	Gemma 4 31B BF16

What is Quantization?

warning

MoE memory requirements explained

Gemma 4 Memory Requirements — VRAM Guide for All Models

Overview

BF16 Full Precision Memory Requirements

Q4 Quantized Memory Requirements

Recommended Hardware by Gemma 4 Model

What is Quantization?

Memory Requirements for Apple Silicon

Related Pages

Gemma 4 Memory Requirements — VRAM Guide for All Models

Overview

BF16 Full Precision Memory Requirements

Q4 Quantized Memory Requirements

Recommended Hardware by Gemma 4 Model

What is Quantization?

Memory Requirements for Apple Silicon

Related Pages