Local Runtime v2.4.0
Bring Intelligence to Source
All six Gemma 4 runtimes in one place. Install locally, no cloud required.
Recommended
Ollama
The easiest way to get up and running on macOS, Linux, and Windows. Single CLI command deployment.
ollama run gemma4:e4b
Hugging Face
Full Python control with the Transformers library. Best for fine-tuning and ML pipelines.
LM Studio
GUI-based local AI tool. Download and run Gemma 4 with no command line required.
MLX
Apple Silicon optimised framework for maximum efficiency on M-series chips.
GGUF
Quantised models designed for CPU/GPU split execution across any hardware.
llama.cpp
High-performance C++ backend with full quantisation control and CUDA/Metal support.
Diagnostic Hub
Not sure which runtime?
Low VRAM?
Use GGUF Q4 or Ollama's auto-quantize
Prefer GUI?
LM Studio runs Gemma 4 with no CLI
Apple Silicon?
MLX gives the best tokens/sec on M-chips