Fix: <unused24> Tokens in Gemma 4 llama.cpp Output
How to fix the <unused24>, <unused25> garbage tokens appearing in Gemma 4 output when using llama.cpp. Caused by missing --chat-template flag.
Symptom
When running Gemma 4 with llama.cpp, the model output contains <unused24>, <unused25>, and similar garbage tokens mixed into otherwise normal text:
<unused24><unused25>Here is my response<unused26>The actual content may be readable, but it is wrapped in or interspersed with these <unused*> tags. The problem appears consistently across every prompt.
Why It Happens
Gemma 4 uses a specific chat template with special turn delimiters: <start_of_turn> and <end_of_turn>. These are part of Gemma's tokenizer vocabulary.
When llama.cpp does not know it is running a Gemma model, it applies a generic default chat template that injects different control tokens. Those tokens do not exist in Gemma's vocabulary as meaningful tokens — instead they map to <unused24>, <unused25>, etc., which are placeholder slots in the Gemma tokenizer that were never assigned a purpose.
The fix is to explicitly tell llama.cpp to use Gemma's chat template format.
Fix: Add the --chat-template gemma Flag
For llama-cli:
./llama-cli -m gemma-4-4b-it-Q4_K_M.gguf \
--chat-template gemma \
-n 512For llama-server:
./llama-server -m gemma-4-4b-it-Q4_K_M.gguf \
--chat-template gemma \
--port 8080With --chat-template gemma, llama.cpp uses <start_of_turn>user / <end_of_turn> delimiters that match what Gemma 4 was trained on. The <unused*> tokens will no longer appear.
Alternative: Use Ollama
Ollama includes a bundled Modelfile for Gemma 4 that automatically applies the correct chat template. If you do not need llama.cpp specifically, switching to Ollama eliminates this class of template errors entirely:
ollama run gemma4:4bNo additional flags are required — Ollama handles the template configuration automatically.