It teaches you four things about Qwen-3 chat templates

The boring jinja snippet tells you about the new Qwen-3 model.

The new Qwen-3 model by Qwen features a much more refined chat template than its predecessors, Qwen-2.5 and QWQ. Looking at the differences in Jinja templates, you can find interesting insights into the new model.

Chat template

What is a chat template?

Chat templates define how conversations between users and models are structured and formatted. The template acts as a translator and transforms human-readable conversations.

({ role: “user”, content: “Hi!” },{ role: “assistant”, content: “Hello, how can I help you today?” },{ role: “user”, content: “I’m looking for new shoes.” },)

Model-friendly format:

think>

The Hugging Face Model Page lets you easily view chat templates for a particular model.

QWEN/QWEN3-235B-A22B chat template

Dive into the Qwen-3 chat template and see what you can learn!

1. There’s no need to force reasoning

And you can make it an option via a simple prefill…

Qwen-3 is unique in its ability to toggle inference via the enable_thinking flag. If set to false, the template will insert an empty pair and tell the model to skip step-by-step ideas. Previous models burned tags into all generations and forced the chain whether you want it or not.

{% – if enable_thinking is defined, enable_thinking is false%}
{ – ‘\n \n \n \n’}}
{% – endif %}

For example, QWQ forces inference in every conversation.

{% – if add_generation_prompt%}
{{ – ‘<| im_start |> Assistant \ n \ n’}}
{% – endif %}

If Enable_Thinking is TRUE, the model can decide whether to think or not.

You can use the following code to test the template:

Import { Template } from “@huggingface/jinja”;
Import {downloadfile} from “@Huggingface/Hub”;

const hf_token = Process.Env.hf_token;

const File= wait downloadfile({
Report: “Qwen/qwen3-235b-a22b”,
path: “tokenizer_config.json”,
AccessToken: hf_token,});
const config = wait file! .JSON();

const Template= new Template(config.chat_template);
const Result = Template.give({message,
add_generation_prompt: truth,
enable_thinking: error,
bos_token:config.bos_token,
EOS_TOKEN:config.EOS_TOKEN,});

2. Context management must be dynamic

Qwen-3 intelligently stores or prunes inference blocks using a rolling checkpoint system to maintain relevant context. The older model discarded inference prematurely to save tokens.

Qwen-3 introduces “rolling checkpoints” by traversing the message list inversely to find the latest user turns that are not tool calls. After that index, any assistant will reply and will keep the full block. In the past, everything was stripped of.

Why is this important:

Displays the active plan during a multi-step tool call. Supports nested tool workflows without losing context. Store tokens with pruning ideas that the model doesn’t need. Prevents “old” inferences from bleeding on new tasks.

example

Here is an example of saving ideas through tool calls using QWEN-3 and QWQ.

To test your chat template, see @huggingface/jinja

3. Tool arguments need better serialization

Previously, all tool_call.arguments fields were piped Todson risks a double escape, even if it was already a JSON-encoded string. Qwen ‑ 3 first checks the type and only checks serialization if necessary.

{% – if tool_call.arguments is string%}
{{ – tool_call.arguments}}
{% – Other than that %}
{{ – tool_call.arguments |Todisoon}}
{% – endif %}

4. No default system prompt required

Like most models, the QWEN ‑ 2.5 series has a default system prompt.

You are created by Alibaba Cloud with Qwen. You are a kind assistant.

This is pretty common as it helps the model respond to user questions such as “Who are you?”

QWEN-3 and QWQ ships without this default system prompt. Nevertheless, the model can accurately identify the creator when asked.

Conclusion

QWEN-3 demonstrates that through CHAT_TEMPLATE, it can provide better flexibility, smarter context handling, and improved tool interaction. These improvements not only improve functionality, but also make agent workflows more reliable and efficient.

versatileai

See Full Bio

What's Hot

AI Art Trends 2024: God’s Hand Created with Primo Models on the Piclumen Platform | AI News Details

Efficient Multimodal Data Pipeline

Leading the Korean LLM evaluation ecosystem

Efficient Multimodal Data Pipeline

Leading the Korean LLM evaluation ecosystem

Welcome Gemma – Google’s new open LLM

Leading the Korean LLM evaluation ecosystem

Introducing the Red Team Resistance Leaderboard

The role of AI in national security depends on data quality

Most Popular

Leading the Korean LLM evaluation ecosystem

Introducing the Red Team Resistance Leaderboard

The role of AI in national security depends on data quality

Don't Miss

AI Art Trends 2024: God’s Hand Created with Primo Models on the Piclumen Platform | AI News Details

Efficient Multimodal Data Pipeline

Leading the Korean LLM evaluation ecosystem

Subscribe to Updates

What's Hot

It teaches you four things about Qwen-3 chat templates

Chat template

What is a chat template?

1. There’s no need to force reasoning

2. Context management must be dynamic

example

3. Tool arguments need better serialization

4. No default system prompt required

Conclusion

Related Posts