Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

The end of the silent performance killer

September 15, 2025

Accelerating stable diffusion XL inference using JAX on cloud cloud TPU V5E

September 15, 2025

Meta amends AI chatbot policy amid child safety concerns

September 14, 2025
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Monday, September 15
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources
Versa AI hub
Home»Tools»The end of the silent performance killer
Tools

The end of the silent performance killer

versatileaiBy versatileaiSeptember 15, 2025No Comments10 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
#image_title
Share
Facebook Twitter LinkedIn Pinterest Email



Spector is an unforgettable chat model – Spector in the wrong format!

tl; dr

The chat model is trained in very different formats to convert conversations into a single tokenizable string. It is very important to match the format used during training, as using a different format from the one in which the model was trained will usually cause serious silent performance degradation! There is a chat_template attribute that can be used by hugging the face talker to save the model’s trained chat format. This attribute contains a jinja template that converts conversation history to a correctly formatted string. See the technical documentation for information on how to write and apply chat templates to your code.

introduction

🤗 If you’re familiar with the Trans Library, I’ve probably written the following code:

tokenizer = autotokenizer.from_pretrained(checkpoint) model = automodel.from_pretrained(checkpoint)

By loading the token agent and model from the same checkpoint, the input is tokenized in the model’s prediction method. Selecting a tokensor from another model can cause input tokenization to be completely different, which can result in serious damage to the performance of the model. The term for this is distribution shift – the model is learning data from one distribution (trained tokenization), and suddenly shifted to something completely different.

Whether you are fine-tuning your model or using it directly for inference, it is recommended to minimize these distribution shifts and keep the inputs similar to the trained inputs. Using a regular language model makes it relatively easy to do. You can just load the Tokensor and the model from the same checkpoint and go.

However, it’s a little different in the chat model. This is because “chat” is not just a single text that can be easily tokenized, but a series of messages, each containing roles and content. Most commonly, roles are “user” for messages sent by the user, “assistant for responses written by the model”, and “system” with options for high-level directives given at the start of the conversation.

If that all seems a bit abstract, here is an example of a chat to make it more specific:

({“role”: “user”, “content”: “Hi!”},{“role”: “assistant”, “content”: “nice to meet you!”})

This set of messages must be converted to a text string before they can be tokenized and used as input to the model. But the problem is that there are many ways to do this conversion! For example, you can convert a list of messages to “Instant Messenger” format.

User: Hey! Bot: Nice to meet you!

Alternatively, you can add a special token to indicate the role.

(User) Hey! (/user) (asst) Nice to meet you! (/asst)

Alternatively, you can add tokens that indicate boundaries between messages, but you can also insert role information as a string.

Users, there! Nice to meet you, good assistant!

There are many ways to do this, but none of them is clearly the best or right way to do it. As a result, different models are trained in very different formats. I did not make any of these examples. They are all authentic and used in at least one active model! However, once the model is trained in a specific format, it is possible to ensure that future inputs use the same format, or to obtain delivery shifts that destroy performance.

Template: How to save format information

Currently, if you’re lucky, the required format is correctly documented somewhere on your model card. If you’re unlucky, you’re lucky if you want to use that model because it’s not. In extreme cases, I’ve included the entire prompt format in my blog post to ensure that users don’t miss it! However, even in the best case scenario, you will still need to find template information and code it up manually in the tweak or inference pipeline. I think this is particularly dangerous as using the wrong chat format is a silent error. There are no major failures or exceptions in Python.

This is an issue that chat templates aim to solve. A chat template is a Jinja template string stored and loaded with token agent, and contains all the information needed to convert a list of chat messages into the correct format input for the model. Below are three chat template strings that correspond to the above three message formats:

{% for message in message%}
{% if message(‘role’)== ‘user’%}
{{“user:”}}
{% Other than that %}
{{“bot:”}}
{{message(‘content’) +’\n’}}
{% endfor %}

{% for message in message%}
{% if message(‘role’)== ‘user’%}
{{“(user)” + message(‘content’) + “(/user)”}}
{% Other than that %}
{{“(asst)” + message(‘content’) + “(/asst)”}}
{{message(‘content’) +’\n’}}
{% endfor %}

“{% for message in message%}”
“{{” + message(‘role’) + ‘\ n’ + message(‘content’) + ‘\ n’}}}”
“{% endfor %}“

If you’re new to Jinja, we highly recommend looking at these template strings and corresponding template output to see if you can see if the template can understand how to turn the list of messages into formatted strings. The syntax is very similar to Python in many ways.

Why template?

Jinja can be confusing if you’re new to the beginning, but in reality you’ll find that Python programmers can pick it up right away. During the development of this feature, we looked into other approaches, such as a limited system to allow users to specify role-by-role prefixes and suffixes for messages. This can be confusing and cumbersome and is extremely flexible, and I found some models to need hacky workarounds. Templates, on the other hand, are powerful enough to cleanly support all the message formats we know.

Why do you bother to do this? Would you like to choose the standard format?

This is a great idea! Unfortunately, it’s too late as multiple important models are already trained in very different chat formats.

However, this can be alleviated a bit. I think the closest thing to “standard” for formatting is the CHATML format created by OpenAI. It is best to train a new model for chat and if this format is suitable for you, then use it to add tokens and tokens to your tokens. The advantage is that roles are very flexible in roles, as they are simply inserted as strings rather than having a specific role token. If you want to use this, it is the third in the template above and can be set up with this simple one-liner.

tokenizer.chat_template = “{% of message in message%} {{” + message (‘role’) + ‘\n’ + message (‘content’) + ” + ‘\n’}} {%endfor%}}”

However, there is a second reason why we don’t hardcode standard formats, but we hope that templates will be widely useful in preprocessing many types of models, including models that may be doing something very different to standard chat, beyond the surge in existing formats. Standard format hardcode limits the ability of model developers to do things they haven’t thought about using this feature, but templates give users and developers the most freedom. It is even possible to encode checks and logic in templates. This is a feature that is not widely used in the default template, but we expect it to have a huge power in the hands of adventurous users. We strongly believe that an open source ecosystem should allow you to do what you want, not to tell you what you want, but to do what you want.

How does the template work?

Chat templates are part of Tokensor as they play the same role as Tokensor. Save information on how to preprocess the data so that it feeds the data into the model in the same format as you saw during training. It was designed to be easy to add and save template information to existing token agents or upload to hubs.

Before the chat template, chat format information was stored at the class level. This means, for example, that all Llama checkpoints get the same chat format and use hardcoded code in the Llama model class transformer. For backward compatibility, model classes with methods in a custom chat format are instead given a default chat template.

The default chat template is also set at the class level, telling classes like ConversationPipeline how to format the input if the model does not have a chat template. This is done purely for backward compatibility. It is highly recommended to explicitly set up a chat template in your chat model, even if the default chat template is appropriate. This will prevent future changes or deprecation of the default chat template from breaking the model. I’ll keep my default chat template in the near future, but I’d like to move all models to explicit chat templates over time. At that point, you can completely remove the default chat template.

Please refer to the technical documentation for information on how to set up and apply chat templates.

How do I get started with a template?

easy! If the chat_template attribute is set in the tokensor, it is ready. You can use that model and tokenizer in ConversationPipeline or format the chat for inference or training by calling tokenizer.apply_chat_template(). For more information, see the Developer Guide or the Apply_Chat_Template documentation!

It may work even if the talknaser does not have a chat_template attribute, but it uses the default chat template set for that model class. As mentioned above, this is fragile and is also the source of silent bugs when the class template does not match what the model actually trains. If you use a checkpoint that does not have a chat_template, we recommend checking a document like a model card to check the correct format and adding the correct chat_template for that format. It is recommended that you do this even if the default chat template is correct. It is recommended that you prevent the model in the future. It also makes it clear that the template exists and is appropriate.

By opening a pull request, you can add chat_template even at checkpoints that you are not owned. The only change you need is to set the tokenizer.chat_template attribute to the Jinja template string. Once that’s done, push the changes and you’re ready!

If you want to use checkpoints for chats but can’t find the document in the chat format you used, you probably need to open the issue with the checkpoint or with the owner’s ping. Once you know what format your model is using, open a pull request and add the appropriate chat_template. Other users will be really grateful!

Conclusion: Template philosophy

I think the template is a very exciting change. In addition to solving the huge source of silent performance-killing bugs, I think they open up a whole new approach and data modality. Perhaps most importantly, it represents philosophical change. Takes large features from the core transformer codebase and moves them to individual model repositories. Here, users have the freedom to do strange, wild and great things. We are excited to see what you use on them!

author avatar
versatileai
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleAccelerating stable diffusion XL inference using JAX on cloud cloud TPU V5E
versatileai

Related Posts

Tools

Accelerating stable diffusion XL inference using JAX on cloud cloud TPU V5E

September 15, 2025
Tools

Meta amends AI chatbot policy amid child safety concerns

September 14, 2025
Tools

Accelerate over 130,000 hugged face models with the ONNX runtime

September 14, 2025
Add A Comment

Comments are closed.

Top Posts

How Vancouver retailers leverage AI to stay competitive

November 22, 20246 Views

New You Studio will be opened in the competitive Southern Florida market for the production function, AI, and post -production service.

January 28, 20255 Views

Plans to ‘unleash AI’ across the UK revealed

January 12, 20255 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

How Vancouver retailers leverage AI to stay competitive

November 22, 20246 Views

New You Studio will be opened in the competitive Southern Florida market for the production function, AI, and post -production service.

January 28, 20255 Views

Plans to ‘unleash AI’ across the UK revealed

January 12, 20255 Views
Don't Miss

The end of the silent performance killer

September 15, 2025

Accelerating stable diffusion XL inference using JAX on cloud cloud TPU V5E

September 15, 2025

Meta amends AI chatbot policy amid child safety concerns

September 14, 2025
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2025 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?