Local Runtime v2.4.0

Bring Intelligence to Source

All six Gemma 4 runtimes in one place. Install locally, no cloud required.

Recommended

Ollama

The easiest way to get up and running on macOS, Linux, and Windows. Single CLI command deployment.

ollama run gemma4:e4b

Get Started with Ollama

Hugging Face

Full Python control with the Transformers library. Best for fine-tuning and ML pipelines.

View guide

LM Studio

GUI-based local AI tool. Download and run Gemma 4 with no command line required.

View guide

MLX

Apple Silicon optimised framework for maximum efficiency on M-series chips.

View guide →

GGUF

Quantised models designed for CPU/GPU split execution across any hardware.

View guide →

llama.cpp

High-performance C++ backend with full quantisation control and CUDA/Metal support.

View guide →

Diagnostic Hub

Not sure which runtime?

Low VRAM?

Use GGUF Q4 or Ollama's auto-quantize

Prefer GUI?

LM Studio runs Gemma 4 with no CLI

Apple Silicon?

MLX gives the best tokens/sec on M-chips

Local Runtime v2.4.0

Bring Intelligence to Source

All six Gemma 4 runtimes in one place. Install locally, no cloud required.

Recommended

Ollama

The easiest way to get up and running on macOS, Linux, and Windows. Single CLI command deployment.

ollama run gemma4:e4b

Get Started with Ollama

Hugging Face

Full Python control with the Transformers library. Best for fine-tuning and ML pipelines.

View guide

LM Studio

GUI-based local AI tool. Download and run Gemma 4 with no command line required.

View guide

MLX

Apple Silicon optimised framework for maximum efficiency on M-series chips.

View guide →

GGUF

Quantised models designed for CPU/GPU split execution across any hardware.

View guide →

llama.cpp

High-performance C++ backend with full quantisation control and CUDA/Metal support.

View guide →

Diagnostic Hub

Not sure which runtime?

Low VRAM?

Use GGUF Q4 or Ollama's auto-quantize

Prefer GUI?

LM Studio runs Gemma 4 with no CLI

Apple Silicon?

MLX gives the best tokens/sec on M-chips