gemma4.devgemma4.dev
MkSaaS文档
gemma4.devgemma4.dev
MkSaaS文档
首页Gemma 4 Developer Hub
Getting StartedQuickstart — OllamaHardware Requirements
X (Twitter)
Getting Started

Hardware Requirements

VRAM, RAM, and storage requirements for all Gemma 4 model variants across runtimes.

Hardware Requirements

Gemma 4 runs on a wide range of hardware. Use this page to find the right model and quantization for your setup.

Quick reference

ModelFull precisionQ8Q4CPU-only
E2B4 GB2 GB1.4 GB✓ (slow)
E4B8 GB4.5 GB3.2 GB✓
26B A4B52 GB28 GB16.4 GBNot recommended
31B62 GB32 GB24 GBNot recommended

GPU recommendations

Apple Silicon (M1–M4)

Unified memory. E4B runs well at 8 GB. 26B A4B needs 24 GB+ (M3 Pro/M4 Max). Use MLX for best performance.

NVIDIA RTX 3000/4000

RTX 3060 (12 GB) handles E4B comfortably. RTX 4090 (24 GB) can run 31B at Q4. Use CUDA backend.

NVIDIA RTX 3000/4000 8 GB

Use E4B at Q4 or E2B at full precision. Context window limited to ~8K for stable inference.

CPU only

E2B and E4B are usable on modern CPUs via llama.cpp. Expect 2–8 tokens/sec on 8-core machines.

Storage

Each model requires disk space for weights:

  • E2B: ~1.5 GB (Q4) — ~4 GB (FP16)
  • E4B: ~3 GB (Q4) — ~8 GB (FP16)
  • 26B A4B: ~16 GB (Q4) — ~52 GB (FP16)
  • 31B: ~24 GB (Q4) — ~62 GB (FP16)

MoE models (26B A4B) load 4B of active parameters during inference, but the full weight set must fit in storage. Disk space requirements are based on the full parameter count.

Context window and VRAM

Longer context windows consume more VRAM during inference. The minimum VRAM figures above assume short contexts (≤2K tokens). For full context:

ModelMax contextAdditional VRAM
E4B (32K)32,768 tokens+2–4 GB
26B A4B (128K)131,072 tokens+8–12 GB
31B (256K)262,144 tokens+16–24 GB

Platform notes

macOS: Apple Unified Memory is shared between CPU and GPU. A 16 GB M-series Mac can run E4B comfortably and 26B A4B in a pinch.

Windows: Use WSL2 for Ollama and llama.cpp. Native Windows support is available for LM Studio and llama.cpp builds.

Linux: Best performance across all runtimes. NVIDIA CUDA recommended for models above E4B.

Quickstart — Ollama

Run Gemma 4 E4B locally in 3 commands using Ollama.

Model Reference

All four Gemma 4 variants — architecture, specs, and recommended use cases.

目录

Hardware Requirements
Quick reference
GPU recommendations
Storage
Context window and VRAM
Platform notes