The boring jinja snippet tells you about the new Qwen-3 model.
The new Qwen-3 model by Qwen features a much more refined chat template than its predecessors, Qwen-2.5 and QWQ. Looking at the differences in Jinja templates, you can find interesting insights into the new model.
Chat template
What is a chat template?
Chat templates define how conversations between users and models are structured and formatted. The template acts as a translator and transforms human-readable conversations.
({ role: “user”, content: “Hi!” },{ role: “assistant”, content: “Hello, how can I help you today?” },{ role: “user”, content: “I’m looking for new shoes.” },)
Model-friendly format:
<| im_start |>Hello users!<| im_end |> <| im_start |>Assistant Hello, what should I do today?
<think>
think>
The Hugging Face Model Page lets you easily view chat templates for a particular model.
QWEN/QWEN3-235B-A22B chat template
Dive into the Qwen-3 chat template and see what you can learn!
1. There’s no need to force reasoning
And you can make it an option via a simple prefill…
Qwen-3 is unique in its ability to toggle inference via the enable_thinking flag. If set to false, the template will insert an empty pair and tell the model to skip step-by-step ideas. Previous models burned tags into all generations and forced the chain whether you want it or not.
{% – if enable_thinking is defined, enable_thinking is false%}
{ – ‘\n \n \n \n’}}
{% – endif %}
For example, QWQ forces inference in every conversation.
{% – if add_generation_prompt%}
{{ – ‘<| im_start |> Assistant \ n \ n’}}
{% – endif %}
If Enable_Thinking is TRUE, the model can decide whether to think or not.
You can use the following code to test the template:
Import { Template } from “@huggingface/jinja”;
Import {downloadfile} from “@Huggingface/Hub”;
const hf_token = Process.Env.hf_token;
const File= wait downloadfile({
Report: “Qwen/qwen3-235b-a22b”,
path: “tokenizer_config.json”,
AccessToken: hf_token,});
const config = wait file! .JSON();
const Template= new Template(config.chat_template);
const Result = Template.give({message,
add_generation_prompt: truth,
enable_thinking: error,
bos_token:config.bos_token,
EOS_TOKEN:config.EOS_TOKEN,});
2. Context management must be dynamic
Qwen-3 intelligently stores or prunes inference blocks using a rolling checkpoint system to maintain relevant context. The older model discarded inference prematurely to save tokens.
Qwen-3 introduces “rolling checkpoints” by traversing the message list inversely to find the latest user turns that are not tool calls. After that index, any assistant will reply and will keep the full block. In the past, everything was stripped of.
Why is this important:
Displays the active plan during a multi-step tool call. Supports nested tool workflows without losing context. Store tokens with pruning ideas that the model doesn’t need. Prevents “old” inferences from bleeding on new tasks.
example
Here is an example of saving ideas through tool calls using QWEN-3 and QWQ.
To test your chat template, see @huggingface/jinja
3. Tool arguments need better serialization
Previously, all tool_call.arguments fields were piped Todson risks a double escape, even if it was already a JSON-encoded string. Qwen ‑ 3 first checks the type and only checks serialization if necessary.
{% – if tool_call.arguments is string%}
{{ – tool_call.arguments}}
{% – Other than that %}
{{ – tool_call.arguments |Todisoon}}
{% – endif %}
4. No default system prompt required
Like most models, the QWEN ‑ 2.5 series has a default system prompt.
You are created by Alibaba Cloud with Qwen. You are a kind assistant.
This is pretty common as it helps the model respond to user questions such as “Who are you?”
QWEN-3 and QWQ ships without this default system prompt. Nevertheless, the model can accurately identify the creator when asked.
Conclusion
QWEN-3 demonstrates that through CHAT_TEMPLATE, it can provide better flexibility, smarter context handling, and improved tool interaction. These improvements not only improve functionality, but also make agent workflows more reliable and efficient.