Gemma 4 Memory Requirements — VRAM Guide for All Models
Exact VRAM requirements for every Gemma 4 model at BF16 full precision and Q4 quantization, with hardware recommendations for common GPUs and Apple Silicon Macs.
Overview
Running Gemma 4 locally requires understanding memory requirements before downloading a model. All Gemma 4 models support quantization (4-bit and 8-bit), which dramatically reduces VRAM usage at a modest quality cost. This guide shows exact memory requirements for each Gemma 4 model at both full precision (BF16) and the most popular quantization level (Q4).
Gemma 4 memory requirements vary significantly between models. The smallest Gemma 4 variant (E2B) runs in just 2GB of VRAM with Q4 quantization — fitting on integrated graphics and older mobile GPUs. The largest Gemma 4 model (31B) requires 64GB for full precision, but drops to a more accessible 18GB with Q4, opening it up to RTX 3090 owners.
BF16 Full Precision Memory Requirements
BF16 (Brain Float 16) is the native precision for Gemma 4 models. Use BF16 when you need maximum output quality and have sufficient VRAM available.
| Model | Parameters | Min VRAM (BF16) |
|---|---|---|
| Gemma 4 E2B | 2.1B | 5 GB |
| Gemma 4 E4B | 4.4B | 10 GB |
| Gemma 4 26B A4B | 26.1B (MoE) | 28 GB |
| Gemma 4 31B | 31B | 64 GB |
Q4 Quantized Memory Requirements
Q4 quantization (4-bit weights) is the recommended option for most local Gemma 4 deployments. It cuts VRAM usage by approximately 75% compared to BF16, with only a 3–5% reduction in benchmark quality.
| Model | Min VRAM (Q4) | Fits On |
|---|---|---|
| Gemma 4 E2B Q4 | 2 GB | Any modern GPU |
| Gemma 4 E4B Q4 | 4 GB | GTX 1660, RTX 3060 |
| Gemma 4 26B A4B Q4 | 14 GB | RTX 3090, RTX 4090 |
| Gemma 4 31B Q4 | 18 GB | RTX 3090 24GB+ |
Sweet spot for local development
Recommended Hardware by Gemma 4 Model
Not sure which Gemma 4 model fits your machine? Use this table to find the best match for your hardware. All recommendations use Q4 quantization unless otherwise noted.
| Hardware | Recommended Gemma 4 Model |
|---|---|
| Apple MacBook Air M2 (8GB) | Gemma 4 E2B Q4 |
| Apple MacBook Pro M3 (16GB) | Gemma 4 E4B Q4 |
| Apple MacBook Pro M3 Max (48GB) | Gemma 4 26B A4B Q4 |
| NVIDIA RTX 3070 (8GB) | Gemma 4 E4B Q4 |
| NVIDIA RTX 3090 (24GB) | Gemma 4 26B A4B Q4 |
| NVIDIA A100 (80GB) | Gemma 4 31B BF16 |
What is Quantization?
Quantization reduces the bit-width of model weights — from 16-bit (BF16) to 8-bit (Q8) or 4-bit (Q4). For Gemma 4 models, Q4 quantization typically reduces quality by 3–5% on standard benchmarks while cutting memory usage by approximately 75%. Q8 quantization offers a middle ground: roughly 50% memory reduction with less than 1% quality loss.
When you run a Gemma 4 model through Ollama, it automatically selects Q4_K_M quantization — a variant of Q4 that applies different quantization levels to different weight layers, prioritizing accuracy for the most sensitive parts of the Gemma 4 architecture. This is the recommended quantization format for all Gemma 4 models running on consumer hardware.
For Gemma 4 models running in production on server-grade hardware, use BF16 or FP8 precision (where supported) to maximize output quality. FP8 is supported on H100 GPUs and cuts memory roughly in half compared to BF16, making Gemma 4 31B fit in 32GB of VRAM at near-full quality.
MoE memory requirements explained
Memory Requirements for Apple Silicon
Apple Silicon Macs use unified memory — the same memory pool serves both CPU and GPU workloads. This is distinct from discrete GPU VRAM. When running Gemma 4 on Apple Silicon, count the full system memory, not just the GPU allocation. A MacBook Pro with 16GB unified memory has roughly 12–13GB available for model weights after macOS overhead, making Gemma 4 E4B Q4 (4GB) comfortable and Gemma 4 E4B BF16 (10GB) feasible.
Apple Silicon benefits from the Metal Performance Shaders (MPS) backend in llama.cpp and Ollama. Gemma 4 inference on M-series chips is significantly faster than CPU-only inference on x86 machines with the same memory, making Apple Silicon an excellent platform for running Gemma 4 E4B and Gemma 4 E2B locally.