Fix: <unused24> Tokens in Gemma 4 llama.cpp Output

How to fix the <unused24>, <unused25> garbage tokens appearing in Gemma 4 output when using llama.cpp. Caused by missing --chat-template flag.

Symptom

When running Gemma 4 with llama.cpp, the model output contains <unused24>, <unused25>, and similar garbage tokens mixed into otherwise normal text:

<unused24><unused25>Here is my response<unused26>

The actual content may be readable, but it is wrapped in or interspersed with these <unused*> tags. The problem appears consistently across every prompt.

Why It Happens

Gemma 4 uses a specific chat template with special turn delimiters: <start_of_turn> and <end_of_turn>. These are part of Gemma's tokenizer vocabulary.

When llama.cpp does not know it is running a Gemma model, it applies a generic default chat template that injects different control tokens. Those tokens do not exist in Gemma's vocabulary as meaningful tokens — instead they map to <unused24>, <unused25>, etc., which are placeholder slots in the Gemma tokenizer that were never assigned a purpose.

The fix is to explicitly tell llama.cpp to use Gemma's chat template format.

Fix: Add the `--chat-template gemma` Flag

For llama-cli:

./llama-cli -m gemma-4-4b-it-Q4_K_M.gguf \
  --chat-template gemma \
  -n 512

For llama-server:

./llama-server -m gemma-4-4b-it-Q4_K_M.gguf \
  --chat-template gemma \
  --port 8080

With --chat-template gemma, llama.cpp uses <start_of_turn>user / <end_of_turn> delimiters that match what Gemma 4 was trained on. The <unused*> tokens will no longer appear.

Alternative: Use Ollama

Ollama includes a bundled Modelfile for Gemma 4 that automatically applies the correct chat template. If you do not need llama.cpp specifically, switching to Ollama eliminates this class of template errors entirely:

ollama run gemma4:4b

No additional flags are required — Ollama handles the template configuration automatically.

Running Gemma 4 with llama.cpp