Download and run Gemma 4 models using LM Studio's GUI. No command line needed — includes model selection, chat interface, and local API server setup.

Run Gemma 4 with LM Studio

LM Studio is a GUI application for running Gemma 4 locally with no terminal required. Download models, chat interactively, and expose a local API server — all from a visual interface. LM Studio is available for macOS, Windows, and Linux.

Step 1: Download LM Studio

Download from lmstudio.ai. The app is free and installs like any desktop application. No command line setup is needed before running Gemma 4.

Step 2: Find Gemma 4 in LM Studio

Open LM Studio
Click the search icon in the left sidebar
Search for gemma4 or gemma-4
Look for models from bartowski or google — bartowski quantizations are recommended for Gemma 4

Step 3: Choose a Gemma 4 Model

Select a Gemma 4 variant based on your available RAM or VRAM:

Hardware	Recommended Model	Download Size
8 GB RAM / VRAM	gemma-4-2b-it-Q4_K_M	~2 GB
16 GB RAM / VRAM	gemma-4-4b-it-Q4_K_M	~3 GB
24 GB VRAM	gemma-4-4b-it-Q8_0 or 26b-Q3	~5–14 GB
32 GB+ unified (Mac M3 Max)	gemma-4-26b-a4b-Q4_K_M	~14 GB

If you are unsure, start with gemma-4-4b-it-Q4_K_M. It runs on most modern machines and gives a good sense of Gemma 4 quality.

Step 4: Load and Chat with Gemma 4

Click the Gemma 4 model you downloaded
Click Load Model
Wait for the green dot indicating the model is loaded
Open the Chat tab and start typing

Gemma 4 E4B responds within 1–3 seconds per token on a modern M-series Mac or NVIDIA GPU in this configuration.

Step 5: Use LM Studio as a Local API Server

LM Studio includes an OpenAI-compatible local server, which lets you point any existing OpenAI client at your local Gemma 4 instance:

Click the server icon in the left sidebar
Select your loaded Gemma 4 model
Click Start Server (default port: 1234)

Query the Gemma 4 server:

curl http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma-4-4b-it-Q4_K_M",
    "messages": [{"role": "user", "content": "Hello from Gemma 4!"}]
  }'

The response format is identical to the OpenAI API, so you can drop Gemma 4 into any app that already uses the OpenAI SDK by changing the base URL to http://localhost:1234/v1.

Troubleshooting

"Failed to load model": This usually means insufficient RAM or VRAM for the selected Gemma 4 variant. See the troubleshooting guide. Try a smaller quantization such as Q4_K_M instead of Q8_0.

Slow responses: In Settings → Performance, enable GPU acceleration if you have an NVIDIA or AMD GPU. Without it, Gemma 4 runs on CPU only and throughput drops significantly.

Model not found in search: LM Studio's model index updates periodically. If you cannot find a Gemma 4 model, update LM Studio to the latest version or search for bartowski/gemma-4.

Run Gemma 4 with Ollama — simpler CLI-based setup
Gemma 4 GGUF guide — manual GGUF download and usage
Failed to load model error — detailed fix