Model Reference
All four Gemma 4 variants — architecture, specs, and recommended use cases.
Model Reference
Gemma 4 ships in four weight configurations. Two are dense models (E2B, E4B) and two are mixture-of-experts architectures (26B A4B, 31B).
Comparison table
| Model | Architecture | Parameters | Active Params | Context | VRAM (Q4) | Best for |
|---|---|---|---|---|---|---|
| E2B | Dense | 2B | 2B | 8K | 1.4 GB | Mobile, embedded |
| E4B | Dense | 4B | 4B | 32K | 3.2 GB | Coding, chat |
| 26B A4B | MoE | 26B | 4B | 128K | 16.4 GB | Long context, writing |
| 31B | Dense | 31B | 31B | 256K | 24 GB | Research, agentic |
Choosing a model
Gemma 4 E2B
2B dense. Runs on any hardware. Edge devices, mobile, CPU-only inference.
Gemma 4 E4B
4B dense. The most popular choice. Great for code, chat, and everyday tasks.
Gemma 4 26B A4B
26B MoE, 4B active. 128K context. Technical writing, RAG, long documents.
Gemma 4 31B
31B dense. 256K context. Full reasoning capability. Production agentic systems.
MoE vs Dense
The 26B A4B is a Mixture of Experts model. It has 26B total parameters but routes each token through only ~4B active parameters per forward pass. This means:
- Inference cost similar to a 4B model
- Capability closer to a 13B+ dense model
- Storage requires the full 26B weights on disk
The 31B model is dense — all 31B parameters are active on every forward pass, giving it the highest raw capability at the cost of VRAM.