Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

Solana’s fast AI benefits and malware losses

January 4, 2026

Arm will be attending @ PyTorch Conference. Please join us!

January 4, 2026

T5Gemma: New collection of encoder/decoder Gemma models

January 3, 2026
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Monday, January 5
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources
Versa AI hub
Home»Tools»T5Gemma: New collection of encoder/decoder Gemma models
Tools

T5Gemma: New collection of encoder/decoder Gemma models

versatileaiBy versatileaiJanuary 3, 2026No Comments5 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
#image_title
Share
Facebook Twitter LinkedIn Pinterest Email

The rapidly evolving landscape of large-scale language models (LLMs) has focused primarily on decoder-only architectures. Although these models have shown good functionality across a wide range of generation tasks, classic encoder/decoder architectures such as T5 (Text-to-Text Transfer Transformer) remain a popular choice for many real-world applications. Encoder-decoder models are often better for summarization, translation, QA, etc. due to their high inference efficiency, design flexibility, and richer encoder representations to understand the input. Nevertheless, powerful encoder/decoder architectures have not received much attention.

Today, we revisit this architecture and introduce T5Gemma, a new collection of encoder-decoder LLMs developed by transforming a pre-trained decoder-specific model into an encoder-decoder architecture through a technique called adaptation. T5Gemma is based on the Gemma 2 framework and includes adapted Gemma 2 2B and 9B models and a set of newly trained T5 size models (Small, Base, Large, XL). We are excited to release the pre-trained and instruction-tuned T5Gemma model to the community to open up new opportunities for research and development.

From decoder only to encoder-decoder

T5Gemma asks the following questions: Can we build a top-level encoder/decoder model based on a pre-trained decoder-only model? We answer this question by considering a technique called model adaptation. The core idea is to initialize the parameters of the encoder-decoder model using the weights of an already pre-trained decoder-only model and further adapt them through UL2 or PrefixLM-based pre-training.

Overview of our approach. Demonstrates how to initialize a new encoder/decoder model using parameters from a pretrained decoder-only model.

This adaptation method is highly flexible and allows for creative combinations of model sizes. For example, you can combine a large encoder and a small decoder (for example, a 9B encoder and a 2B decoder) to create an “unbalanced” model. This allows you to fine-tune the trade-off between quality and efficiency for certain tasks, such as summarization, where a deeper understanding of the input is more important than the complexity of the output produced.

Toward improving the trade-off between quality and efficiency

How is T5Gemma performing?

In our experiments, the T5Gemma model achieves performance equal to or better than the decoder-only Gemma model and nearly dominates the Pareto frontier of quality inference efficiency across several benchmarks, including SuperGLUE, which measures the quality of the learned representations.

Benchmarking encoder/decoder models

Encoder/decoder models consistently provide superior performance for a given level of inference computing and lead on the quality efficiency front across a variety of benchmarks.

This performance advantage is not just theoretical. It also translates into real-world quality and speed. When measuring the actual latency of GSM8K (mathematical reasoning), T5Gemma has a clear win. For example, T5Gemma 9B-9B achieves higher accuracy than Gemma 2 9B, but has similar latency. What’s even more impressive is that even though the T5Gemma 9B-2B has significantly improved accuracy compared to the 2B-2B model, its latency is about the same as the much lower Gemma 2 2B model. Ultimately, these experiments demonstrate that encoder and decoder adaptation provides a flexible and powerful way to balance quality and inference speed.

Unlock basic and fine-tuned features

Could an encoder/decoder LLM have similar functionality to a decoder-only model?

Yes, T5Gemma shows promising functionality before and after instruction tuning.

After pre-training, T5Gemma achieves impressive results on complex tasks that require inference. For example, the T5Gemma 9B-9B scores over 9 points higher in GSM8K (Mathematical Reasoning) and over 4 points higher in DROP (Reading Reading) compared to the original Gemma 2 9B model. This pattern indicates that if the encoder/decoder architecture is initialized through adaptation, it may be possible to create a more capable and performant base model.

Detailed results for pre-trained models

Detailed results for pre-trained models. We show how our adaptive model shows significant improvement on several inference-intensive benchmarks compared to the decoder-only Gemma 2.

These basic improvements through pre-training will set you up to see even more dramatic effects once your instructions are adjusted. For example, when comparing Gemma 2 IT and T5Gemma IT, the overall performance gap widens significantly. For T5Gemma 2B-2B IT, the MMLU score increased by nearly 12 points over Gemma 2 2B, and the GSM8K score increased from 58.0% to 70.7%. An adapted architecture not only potentially provides a better starting point, but also responds more effectively to instruction tuning, ultimately leading to a substantially more capable and useful final model.

Tweak + RLHFed model results

Fine-tuning + detailed results for the RLHF model. We demonstrate post-training capabilities that significantly amplify the performance benefits of encoder/decoder architectures.

Explore the model: T5Gemma checkpoint release

We are very excited to introduce this new way to build powerful general-purpose encoder/decoder models by adapting pre-trained decoder-specific LLMs like Gemma 2. To accelerate further research and enable the community to build on this work, we are pleased to release a suite of T5Gemma checkpoints.

The release includes:

Multiple sizes: checkpoints for T5 size models (Small, Base, Large, and XL), Gemma 2-based models (2B and 9B), and additional models between T5 Large and T5 XL. Multiple variants: pre-trained and instruction-tuned models. Flexible configuration: A powerful and efficient unbalanced 9B-2B checkpoint for exploring trade-offs between encoder and decoder sizes. Different training goals: Models trained with either PrefixLM or UL2 goals to provide state-of-the-art generative performance or representation quality.

We hope these checkpoints will be a valuable resource for exploring your model’s architecture, efficiency, and performance.

Get started with T5Gemma

I can’t wait to see what you build with T5Gemma. For more information please see the following link:

Read the paper to learn about the research behind this project. Explore the model’s capabilities and fine-tune it for your own use case using Colab notebooks.

author avatar
versatileai
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleIllinois faces major legal overhaul in 2026
Next Article Arm will be attending @ PyTorch Conference. Please join us!
versatileai

Related Posts

Tools

Solana’s fast AI benefits and malware losses

January 4, 2026
Tools

Arm will be attending @ PyTorch Conference. Please join us!

January 4, 2026
Tools

Understand how AI and big data are transforming digital marketing

January 3, 2026
Add A Comment

Comments are closed.

Top Posts

Zara’s use of AI shows how retail workflows are quietly changing

December 31, 20257 Views

The AI ​​tools used to generate images of child abuse have become illegal by “leading the world” | Political News

February 2, 20255 Views

T5Gemma: New collection of encoder/decoder Gemma models

January 3, 20264 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

Zara’s use of AI shows how retail workflows are quietly changing

December 31, 20257 Views

The AI ​​tools used to generate images of child abuse have become illegal by “leading the world” | Political News

February 2, 20255 Views

T5Gemma: New collection of encoder/decoder Gemma models

January 3, 20264 Views
Don't Miss

Solana’s fast AI benefits and malware losses

January 4, 2026

Arm will be attending @ PyTorch Conference. Please join us!

January 4, 2026

T5Gemma: New collection of encoder/decoder Gemma models

January 3, 2026
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2026 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?