gemma4.devgemma4.dev
  • Models
  • Run Local
  • Deploy
  • Guides
Try Gemma 4 ↗
gemma4.devgemma4.dev
Run Local/Run Gemma 4 with LM Studio

Run Gemma 4 with LM Studio

Download and run Gemma 4 models using LM Studio's GUI. No command line needed — includes model selection, chat interface, and local API server setup.

Run Gemma 4 with LM Studio

LM Studio is a GUI application for running Gemma 4 locally with no terminal required. Download models, chat interactively, and expose a local API server — all from a visual interface. LM Studio is available for macOS, Windows, and Linux.

Step 1: Download LM Studio

Download from lmstudio.ai. The app is free and installs like any desktop application. No command line setup is needed before running Gemma 4.

Step 2: Find Gemma 4 in LM Studio

  1. Open LM Studio
  2. Click the search icon in the left sidebar
  3. Search for gemma4 or gemma-4
  4. Look for models from bartowski or google — bartowski quantizations are recommended for Gemma 4

Step 3: Choose a Gemma 4 Model

Select a Gemma 4 variant based on your available RAM or VRAM:

HardwareRecommended ModelDownload Size
8 GB RAM / VRAMgemma-4-2b-it-Q4_K_M~2 GB
16 GB RAM / VRAMgemma-4-4b-it-Q4_K_M~3 GB
24 GB VRAMgemma-4-4b-it-Q8_0 or 26b-Q3~5–14 GB
32 GB+ unified (Mac M3 Max)gemma-4-26b-a4b-Q4_K_M~14 GB

If you are unsure, start with gemma-4-4b-it-Q4_K_M. It runs on most modern machines and gives a good sense of Gemma 4 quality.

Step 4: Load and Chat with Gemma 4

  1. Click the Gemma 4 model you downloaded
  2. Click Load Model
  3. Wait for the green dot indicating the model is loaded
  4. Open the Chat tab and start typing

Gemma 4 E4B responds within 1–3 seconds per token on a modern M-series Mac or NVIDIA GPU in this configuration.

Step 5: Use LM Studio as a Local API Server

LM Studio includes an OpenAI-compatible local server, which lets you point any existing OpenAI client at your local Gemma 4 instance:

  1. Click the server icon in the left sidebar
  2. Select your loaded Gemma 4 model
  3. Click Start Server (default port: 1234)

Query the Gemma 4 server:

curl http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma-4-4b-it-Q4_K_M",
    "messages": [{"role": "user", "content": "Hello from Gemma 4!"}]
  }'

The response format is identical to the OpenAI API, so you can drop Gemma 4 into any app that already uses the OpenAI SDK by changing the base URL to http://localhost:1234/v1.

Troubleshooting

"Failed to load model": This usually means insufficient RAM or VRAM for the selected Gemma 4 variant. See the troubleshooting guide. Try a smaller quantization such as Q4_K_M instead of Q8_0.

Slow responses: In Settings → Performance, enable GPU acceleration if you have an NVIDIA or AMD GPU. Without it, Gemma 4 runs on CPU only and throughput drops significantly.

Model not found in search: LM Studio's model index updates periodically. If you cannot find a Gemma 4 model, update LM Studio to the latest version or search for bartowski/gemma-4.

Related

  • Run Gemma 4 with Ollama — simpler CLI-based setup
  • Gemma 4 GGUF guide — manual GGUF download and usage
  • Failed to load model error — detailed fix
gemma4.devgemma4.dev

运行、部署和调试 Gemma 4 模型。专为快节奏开发者打造。

GitHubGitHubTwitterX (Twitter)Email
Models
  • Gemma 4 E2B
  • Gemma 4 E4B
  • Gemma 4 26B
  • Gemma 4 31B
  • Compare Models
Run Local
  • Ollama
  • Hugging Face
  • GGUF
  • LM Studio
  • llama.cpp
Deploy
  • vLLM
  • Gemini API
  • Vertex AI
  • Cloud Run
Guides & Help
  • Thinking Mode
  • Prompt Formatting
  • Function Calling
  • Error Fixes
© 2026 gemma4.dev All Rights Reserved.