Run Gemma 4 with LM Studio
Download and run Gemma 4 models using LM Studio's GUI. No command line needed — includes model selection, chat interface, and local API server setup.
Run Gemma 4 with LM Studio
LM Studio is a GUI application for running Gemma 4 locally with no terminal required. Download models, chat interactively, and expose a local API server — all from a visual interface. LM Studio is available for macOS, Windows, and Linux.
Step 1: Download LM Studio
Download from lmstudio.ai. The app is free and installs like any desktop application. No command line setup is needed before running Gemma 4.
Step 2: Find Gemma 4 in LM Studio
- Open LM Studio
- Click the search icon in the left sidebar
- Search for
gemma4orgemma-4 - Look for models from
bartowskiorgoogle—bartowskiquantizations are recommended for Gemma 4
Step 3: Choose a Gemma 4 Model
Select a Gemma 4 variant based on your available RAM or VRAM:
| Hardware | Recommended Model | Download Size |
|---|---|---|
| 8 GB RAM / VRAM | gemma-4-2b-it-Q4_K_M | ~2 GB |
| 16 GB RAM / VRAM | gemma-4-4b-it-Q4_K_M | ~3 GB |
| 24 GB VRAM | gemma-4-4b-it-Q8_0 or 26b-Q3 | ~5–14 GB |
| 32 GB+ unified (Mac M3 Max) | gemma-4-26b-a4b-Q4_K_M | ~14 GB |
If you are unsure, start with gemma-4-4b-it-Q4_K_M. It runs on most modern machines and gives a good sense of Gemma 4 quality.
Step 4: Load and Chat with Gemma 4
- Click the Gemma 4 model you downloaded
- Click Load Model
- Wait for the green dot indicating the model is loaded
- Open the Chat tab and start typing
Gemma 4 E4B responds within 1–3 seconds per token on a modern M-series Mac or NVIDIA GPU in this configuration.
Step 5: Use LM Studio as a Local API Server
LM Studio includes an OpenAI-compatible local server, which lets you point any existing OpenAI client at your local Gemma 4 instance:
- Click the server icon in the left sidebar
- Select your loaded Gemma 4 model
- Click Start Server (default port: 1234)
Query the Gemma 4 server:
curl http://localhost:1234/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gemma-4-4b-it-Q4_K_M",
"messages": [{"role": "user", "content": "Hello from Gemma 4!"}]
}'The response format is identical to the OpenAI API, so you can drop Gemma 4 into any app that already uses the OpenAI SDK by changing the base URL to http://localhost:1234/v1.
Troubleshooting
"Failed to load model": This usually means insufficient RAM or VRAM for the selected Gemma 4 variant. See the troubleshooting guide. Try a smaller quantization such as Q4_K_M instead of Q8_0.
Slow responses: In Settings → Performance, enable GPU acceleration if you have an NVIDIA or AMD GPU. Without it, Gemma 4 runs on CPU only and throughput drops significantly.
Model not found in search: LM Studio's model index updates periodically. If you cannot find a Gemma 4 model, update LM Studio to the latest version or search for bartowski/gemma-4.
Related
- Run Gemma 4 with Ollama — simpler CLI-based setup
- Gemma 4 GGUF guide — manual GGUF download and usage
- Failed to load model error — detailed fix