T5Gemma: New collection of encoder/decoder Gemma models

The rapidly evolving landscape of large-scale language models (LLMs) has focused primarily on decoder-only architectures. Although these models have shown good functionality across a wide range of generation tasks, classic encoder/decoder architectures such as T5 (Text-to-Text Transfer Transformer) remain a popular choice for many real-world applications. Encoder-decoder models are often better for summarization, translation, QA, etc. due to their high inference efficiency, design flexibility, and richer encoder representations to understand the input. Nevertheless, powerful encoder/decoder architectures have not received much attention.

Today, we revisit this architecture and introduce T5Gemma, a new collection of encoder-decoder LLMs developed by transforming a pre-trained decoder-specific model into an encoder-decoder architecture through a technique called adaptation. T5Gemma is based on the Gemma 2 framework and includes adapted Gemma 2 2B and 9B models and a set of newly trained T5 size models (Small, Base, Large, XL). We are excited to release the pre-trained and instruction-tuned T5Gemma model to the community to open up new opportunities for research and development.

From decoder only to encoder-decoder

T5Gemma asks the following questions: Can we build a top-level encoder/decoder model based on a pre-trained decoder-only model? We answer this question by considering a technique called model adaptation. The core idea is to initialize the parameters of the encoder-decoder model using the weights of an already pre-trained decoder-only model and further adapt them through UL2 or PrefixLM-based pre-training.

Overview of our approach. Demonstrates how to initialize a new encoder/decoder model using parameters from a pretrained decoder-only model.

This adaptation method is highly flexible and allows for creative combinations of model sizes. For example, you can combine a large encoder and a small decoder (for example, a 9B encoder and a 2B decoder) to create an “unbalanced” model. This allows you to fine-tune the trade-off between quality and efficiency for certain tasks, such as summarization, where a deeper understanding of the input is more important than the complexity of the output produced.

Toward improving the trade-off between quality and efficiency

How is T5Gemma performing?

In our experiments, the T5Gemma model achieves performance equal to or better than the decoder-only Gemma model and nearly dominates the Pareto frontier of quality inference efficiency across several benchmarks, including SuperGLUE, which measures the quality of the learned representations.

Benchmarking encoder/decoder models

Encoder/decoder models consistently provide superior performance for a given level of inference computing and lead on the quality efficiency front across a variety of benchmarks.

This performance advantage is not just theoretical. It also translates into real-world quality and speed. When measuring the actual latency of GSM8K (mathematical reasoning), T5Gemma has a clear win. For example, T5Gemma 9B-9B achieves higher accuracy than Gemma 2 9B, but has similar latency. What’s even more impressive is that even though the T5Gemma 9B-2B has significantly improved accuracy compared to the 2B-2B model, its latency is about the same as the much lower Gemma 2 2B model. Ultimately, these experiments demonstrate that encoder and decoder adaptation provides a flexible and powerful way to balance quality and inference speed.

Unlock basic and fine-tuned features

Could an encoder/decoder LLM have similar functionality to a decoder-only model?

Yes, T5Gemma shows promising functionality before and after instruction tuning.

After pre-training, T5Gemma achieves impressive results on complex tasks that require inference. For example, the T5Gemma 9B-9B scores over 9 points higher in GSM8K (Mathematical Reasoning) and over 4 points higher in DROP (Reading Reading) compared to the original Gemma 2 9B model. This pattern indicates that if the encoder/decoder architecture is initialized through adaptation, it may be possible to create a more capable and performant base model.

Detailed results for pre-trained models

Detailed results for pre-trained models. We show how our adaptive model shows significant improvement on several inference-intensive benchmarks compared to the decoder-only Gemma 2.

These basic improvements through pre-training will set you up to see even more dramatic effects once your instructions are adjusted. For example, when comparing Gemma 2 IT and T5Gemma IT, the overall performance gap widens significantly. For T5Gemma 2B-2B IT, the MMLU score increased by nearly 12 points over Gemma 2 2B, and the GSM8K score increased from 58.0% to 70.7%. An adapted architecture not only potentially provides a better starting point, but also responds more effectively to instruction tuning, ultimately leading to a substantially more capable and useful final model.

Tweak + RLHFed model results

Fine-tuning + detailed results for the RLHF model. We demonstrate post-training capabilities that significantly amplify the performance benefits of encoder/decoder architectures.

Explore the model: T5Gemma checkpoint release

We are very excited to introduce this new way to build powerful general-purpose encoder/decoder models by adapting pre-trained decoder-specific LLMs like Gemma 2. To accelerate further research and enable the community to build on this work, we are pleased to release a suite of T5Gemma checkpoints.

The release includes:

Multiple sizes: checkpoints for T5 size models (Small, Base, Large, and XL), Gemma 2-based models (2B and 9B), and additional models between T5 Large and T5 XL. Multiple variants: pre-trained and instruction-tuned models. Flexible configuration: A powerful and efficient unbalanced 9B-2B checkpoint for exploring trade-offs between encoder and decoder sizes. Different training goals: Models trained with either PrefixLM or UL2 goals to provide state-of-the-art generative performance or representation quality.

We hope these checkpoints will be a valuable resource for exploring your model’s architecture, efficiency, and performance.

Get started with T5Gemma

I can’t wait to see what you build with T5Gemma. For more information please see the following link:

Read the paper to learn about the research behind this project. Explore the model’s capabilities and fine-tune it for your own use case using Colab notebooks.

versatileai

See Full Bio

What's Hot

From experiment to corporate reality

Identify content created with Google’s AI tools

Inadequate introduction of AI may be the reason behind the reduction in personnel

From experiment to corporate reality

Identify content created with Google’s AI tools

Inadequate introduction of AI may be the reason behind the reduction in personnel

Open Source DeepResearch – Unlocking Search Agents

Deploying an open source vision language model (VLM) on Jetson

New strategies and breakthroughs for 2026

Most Popular

Open Source DeepResearch – Unlocking Search Agents

Deploying an open source vision language model (VLM) on Jetson

New strategies and breakthroughs for 2026

Don't Miss

From experiment to corporate reality

Identify content created with Google’s AI tools

Inadequate introduction of AI may be the reason behind the reduction in personnel

Subscribe to Updates

What's Hot

T5Gemma: New collection of encoder/decoder Gemma models

From decoder only to encoder-decoder

Toward improving the trade-off between quality and efficiency

Unlock basic and fine-tuned features

Explore the model: T5Gemma checkpoint release

Get started with T5Gemma

Related Posts