gemma4.devgemma4.dev
  • Models
  • Run Local
  • Deploy
  • Guides
Try Gemma 4 ↗
gemma4.devgemma4.dev
Errors/Fix: <unused24> Tokens in Gemma 4 llama.cpp Output

Fix: <unused24> Tokens in Gemma 4 llama.cpp Output

How to fix the <unused24>, <unused25> garbage tokens appearing in Gemma 4 output when using llama.cpp. Caused by missing --chat-template flag.

Symptom

When running Gemma 4 with llama.cpp, the model output contains <unused24>, <unused25>, and similar garbage tokens mixed into otherwise normal text:

<unused24><unused25>Here is my response<unused26>

The actual content may be readable, but it is wrapped in or interspersed with these <unused*> tags. The problem appears consistently across every prompt.

Why It Happens

Gemma 4 uses a specific chat template with special turn delimiters: <start_of_turn> and <end_of_turn>. These are part of Gemma's tokenizer vocabulary.

When llama.cpp does not know it is running a Gemma model, it applies a generic default chat template that injects different control tokens. Those tokens do not exist in Gemma's vocabulary as meaningful tokens — instead they map to <unused24>, <unused25>, etc., which are placeholder slots in the Gemma tokenizer that were never assigned a purpose.

The fix is to explicitly tell llama.cpp to use Gemma's chat template format.

Fix: Add the --chat-template gemma Flag

For llama-cli:

./llama-cli -m gemma-4-4b-it-Q4_K_M.gguf \
  --chat-template gemma \
  -n 512

For llama-server:

./llama-server -m gemma-4-4b-it-Q4_K_M.gguf \
  --chat-template gemma \
  --port 8080

With --chat-template gemma, llama.cpp uses <start_of_turn>user / <end_of_turn> delimiters that match what Gemma 4 was trained on. The <unused*> tokens will no longer appear.

Alternative: Use Ollama

Ollama includes a bundled Modelfile for Gemma 4 that automatically applies the correct chat template. If you do not need llama.cpp specifically, switching to Ollama eliminates this class of template errors entirely:

ollama run gemma4:4b

No additional flags are required — Ollama handles the template configuration automatically.

Related

  • Running Gemma 4 with llama.cpp
gemma4.devgemma4.dev

运行、部署和调试 Gemma 4 模型。专为快节奏开发者打造。

GitHubGitHubTwitterX (Twitter)Email
Models
  • Gemma 4 E2B
  • Gemma 4 E4B
  • Gemma 4 26B
  • Gemma 4 31B
  • Compare Models
Run Local
  • Ollama
  • Hugging Face
  • GGUF
  • LM Studio
  • llama.cpp
Deploy
  • vLLM
  • Gemini API
  • Vertex AI
  • Cloud Run
Guides & Help
  • Thinking Mode
  • Prompt Formatting
  • Function Calling
  • Error Fixes
© 2026 gemma4.dev All Rights Reserved.