gemma4.devgemma4.dev
  • Models
  • Run Local
  • Deploy
  • Guides
Try Gemma 4 ↗
gemma4.devgemma4.dev
Run Local/Run Gemma 4 with GGUF (llama.cpp / LM Studio)

Run Gemma 4 with GGUF (llama.cpp / LM Studio)

Download and run Gemma 4 GGUF quantized models for use with llama.cpp, LM Studio, and any GGUF-compatible runtime. Includes quantization guide.

Run Gemma 4 with GGUF

GGUF is the file format used by llama.cpp, LM Studio, Jan, and other local inference tools. GGUF Gemma 4 models are pre-quantized, making them easy to download and run without GPU-specific setup.

Where to Download Gemma 4 GGUF Files

The best source for Gemma 4 GGUF models is Hugging Face. Search for gemma-4 with the GGUF filter, or use the Bartowski quantizations which are well-tested:

# Using huggingface-cli
pip install huggingface_hub
huggingface-cli download bartowski/gemma-4-4b-it-GGUF --include "*Q4_K_M*"

Quantization Levels for Gemma 4

QuantQualitySize (E4B)Recommended For
Q2_KLow1.8 GBRAM-constrained laptops
Q4_K_MGood3.0 GBMost users
Q6_KHigh3.9 GBAccuracy-sensitive tasks
Q8_0Best4.7 GB8GB+ VRAM, max quality

Run Gemma 4 with llama.cpp

# Clone and build llama.cpp
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make -j$(nproc)

# Run Gemma 4 E4B
./llama-cli -m gemma-4-4b-it-Q4_K_M.gguf \
  -p "You are a helpful AI assistant." \
  --chat-template gemma \
  -n 512

Run Gemma 4 with LM Studio

  1. Open LM Studio and search for gemma4
  2. Select the GGUF model size appropriate for your hardware
  3. Download and load the model
  4. Use the built-in chat interface or the local server for API access

LM Studio automatically handles the Gemma 4 chat template and system prompt format.

Gemma 4 Chat Template in GGUF

When using llama.cpp directly, always specify the Gemma chat template:

./llama-server -m gemma-4-4b-it-Q4_K_M.gguf \
  --chat-template gemma \
  --port 8080

Without the --chat-template gemma flag, the model may produce malformed responses due to incorrect token formatting.

gemma4.devgemma4.dev

运行、部署和调试 Gemma 4 模型。专为快节奏开发者打造。

GitHubGitHubTwitterX (Twitter)Email
Models
  • Gemma 4 E2B
  • Gemma 4 E4B
  • Gemma 4 26B
  • Gemma 4 31B
  • Compare Models
Run Local
  • Ollama
  • Hugging Face
  • GGUF
  • LM Studio
  • llama.cpp
Deploy
  • vLLM
  • Gemini API
  • Vertex AI
  • Cloud Run
Guides & Help
  • Thinking Mode
  • Prompt Formatting
  • Function Calling
  • Error Fixes
© 2026 gemma4.dev All Rights Reserved.