Gemini’s Security Safeguard Advance – Google DeepMind

We have published a new white paper that outlines how the safest model family ever has been turned into Gemini 2.5.

Imagine asking your AI agent to summarise your latest emails. Gemini and other large-scale language models (LLMs) consistently improve in the performance of such tasks by accessing information such as documents, calendars, and external websites. But what if one of these emails contains hidden malicious instructions designed to trick AI into sharing private data or misusing permissions?

Indirect rapid injection presents a real cybersecurity challenge where AI models can struggle to distinguish between authentic user instructions and manipulation commands embedded within the data they acquire. Our new whitepaper is a lesson from protecting Gemini against indirect rapid injections, laying out a strategic blueprint for tackling indirect rapid injections to target such attacks with agent AI tools supported by advanced major language models.

Not only are we capable, our commitment to building AI agents safely means we are constantly working to understand how Gemini responds to indirect rapid injections and become more resilient towards them.

Evaluation of baseline defense strategies

Indirect, rapid injecting attacks are complicated and require a certain level of vigilance and multiple layers of defense. Google Deepmind’s Security and Privacy Research Team specializes in protecting AI models from intentional and malicious attacks. Trying to manually identify these vulnerabilities is slow and inefficient, especially as models evolve rapidly. That’s one of the reasons why we built an automated system to relentlessly probe Gemini defenses.

Make your Gemini safer with automated red teaming

The central part of your security strategy is the automated red team (ART). There, our internal Gemini team constantly attacks Gemini in realistic ways to reveal potential security weaknesses in the model. Using this technique, among other efforts detailed in the white paper, Gemini significantly improves the protection rate for indirect rapid injection attacks during tool use, making Gemini 2.5 the safest model family ever.

We tested some of our own ideas and some of the defense strategies proposed by the research community.

Adjusts the rating of adaptive attacks

Baseline mitigation showed promise for basic nonadaptive attacks, significantly reducing the success rate of attacks. However, malicious actors are increasingly using adaptive attacks specifically designed to evolve into art and adapt to avoid the defenses being tested.

Successful baseline defenses such as spotlight and self-reflection have become far less effective against adaptive attacks that learn to deal with and bypass static defensive approaches.

This finding presents an important point. Relying on defenses that are tested only against static attacks gives us a false sense of security. For robust security, it is important to assess adaptive attacks that evolve in response to potential defenses.

Build inherent resilience through model hardening

While external defense and system-level guardrails are important, it is also important to enhance the inherent ability of AI models to recognize and ignore malicious instructions built into the data. This process is called “model curing.”

Fine tweak your Gemini with a large dataset of realistic scenarios. Here, ART generates effective, indirect, rapid injections targeting sensitive information. This taught Gemini to ignore malicious embedded instructions and to follow the original user request. This allows the model to inherently understand how to process compromised information that evolves over time as part of an adaptive attack.

The hardening of this model significantly improved the ability of Gemini to identify and ignore injected commands, reducing the success rate of attacks. And what’s important is without significantly affecting the performance of the model over normal tasks.

It is important to note that no model is completely immune, even if the model has stiffness. The determined attacker may still find new vulnerabilities. Therefore, our goal is to make attacks much more difficult, expensive and more complicated for the enemy.

Take a holistic approach to modeling security

Protecting AI models from attacks such as indirect rapid injection requires “detailed defense” using multiple layers of protective layers, including model hardening, input and output checks (such as classifiers), and system-level guardrails. The fight against indirect rapid injection is an important way to implement agent security principles and guidelines and develop agents responsibly.

Ensuring sophisticated AI systems against certain evolving threats such as indirect rapid injection is an ongoing process. It requires pursuing continuous and adaptive assessments, improving existing defenses, exploring new defenses, and building resilience inherent to the model itself. With repeated defenses and constant learning, we can ensure that AI assistants like Gemini continue to continue both things that are extremely kind and reliable.

For details on the defense built into Gemini and recommendations for using more challenging and adaptive attacks to assess the robustness of the model, see the GDM whitepaper, Lessons of Gemini’s defense against indirect rapid injections.

What's Hot

US AI company defies EU with ‘massive facial recognition scraping operation’

Streaming datasets: 100x more efficient

Jenny Lee of Granite Asia and Leslie Teo of AI Singapore join the Design AI and Tech Awards judging panel; Design AI and Tech Awards

Streaming datasets: 100x more efficient

Lightricks’ open source AI video delivers 4K, sound, and fast rendering

Anthropic’s $1 billion TPU expansion signals strategic change for enterprise AI infrastructure

Paris AI Safety Breakfast #3: Yoshua Bengio

WhatsApp blocks AI chatbots to protect business platform

Lightricks’ open source AI video delivers 4K, sound, and fast rendering

Most Popular

Paris AI Safety Breakfast #3: Yoshua Bengio

WhatsApp blocks AI chatbots to protect business platform

Lightricks’ open source AI video delivers 4K, sound, and fast rendering

Don't Miss

US AI company defies EU with ‘massive facial recognition scraping operation’

Streaming datasets: 100x more efficient

Jenny Lee of Granite Asia and Leslie Teo of AI Singapore join the Design AI and Tech Awards judging panel; Design AI and Tech Awards

Subscribe to Updates

What's Hot

Gemini’s Security Safeguard Advance – Google DeepMind

Evaluation of baseline defense strategies

Make your Gemini safer with automated red teaming

Adjusts the rating of adaptive attacks

Build inherent resilience through model hardening

Take a holistic approach to modeling security

Related Posts