The rapid adoption of generative AI (GenAI) systems across industries has changed the technology landscape, but also introduced a number of safety and security risks. The Microsoft AI Red Team details our efforts to ensure the robustness of our GenAI products in our latest whitepaper, Lessons from Red Teaming 100 Generative AI Products. More than just a summary of technical discoveries, this document is a comprehensive review of methodologies, challenges, and learnings from red teaming over 100 GenAI systems, and provides real-world context to contextualize the approach. Contains case studies.
At the heart of red teaming is understanding vulnerabilities within a system, not just at the model level, but across its integration with broader applications and workflows. Microsoft’s insights highlight the importance of aligning red team efforts with realistic risks and signal a shift from traditional safety benchmarks to a more nuanced understanding of how AI operates .
At the heart of our methodology is the AI Threat Model Ontology, a structured framework that categorizes vulnerabilities into key components. These include the system itself, the attackers (both hostile and benign), the tactics and techniques employed, and weaknesses and impacts identified. By focusing on this ontology, Microsoft has established a comprehensive method for mapping and analyzing risks, moving beyond traditional adversarial scenarios to include unintended failures caused by innocent users. I’m doing it.
This framework recognizes that AI systems do not work in isolation. They exist as part of a larger ecosystem, with external applications, data sources, and user interactions introducing new attack vectors. By considering vulnerabilities at both the system and model level, the ontology reflects a nuanced approach to AI security and ensures red teams understand the complexities of real-world scenarios.
A new paradigm for AI risk assessment
Microsoft distills eight key lessons from its rich experience, each providing deep insight into the evolving nature of AI red teaming. Unlike traditional safety assessments, red teaming focuses on exploring the limits of what AI systems can do and identifying risks for downstream applications. This requires a move from theoretical models to practical, context-driven evaluation.
For example, understanding the capabilities and limitations of an AI system is the basis for effective risk assessment. Larger models with an enhanced ability to understand complex instructions may offer greater utility, but at the same time are more susceptible to exploitation. Similarly, applications that utilize these models, such as healthcare tools or financial systems, have inherent risks depending on their usage.
One of the most impressive realizations is the effectiveness of simple real-world techniques against computationally intensive gradient-based attacks. Although academic research often emphasizes sophisticated techniques, real-world attackers rely on basic strategies such as rapid engineering and exploiting input/output workflows. This insight highlights the importance of adopting a systems-level adversarial mindset that evaluates the entire AI ecosystem beyond the vulnerabilities of individual models.
case study
The white paper is rich with concrete case studies that apply these lessons. In one example, Microsoft’s Red Teamers were able to bypass the vision language model’s safety guardrails by embedding malicious instructions within an image. This highlights the multifaceted nature of the vulnerability, where textual and visual inputs interact in unexpected ways. Similarly, they investigated how large-scale language models (LLMs) can be manipulated to create automated fraud systems by integrating text-to-speech and speech-to-text functionality. These case studies serve as powerful reminders of the creativity and resourcefulness required to identify and mitigate AI risks.
Another compelling example involves investigating the bias of text-to-image generators. By analyzing the output of scenarios where gender was not specified, the team identified the model’s tendency to reinforce stereotypes, such as depicting secretaries as women and bosses as men. Such findings highlight the broader societal impact of generative AI and the need for comprehensive safety assessments.
The role of automation and human expertise
Automation is a valuable tool for scaling red team efforts, but it cannot replace human judgment and creativity. Microsoft’s development of the PyRIT framework demonstrates that automation can improve efficiency by generating diverse attack scenarios and analyzing the output at scale. However, the team emphasizes that tools like PyRIT should enhance the human element, not replace it. Subject matter experts (SMEs) are essential when assessing complex or domain-specific risks, such as risks involving medical or cultural nuances.
The importance of emotional intelligence is also highlighted, especially when evaluating how AI systems respond to users in need. Microsoft’s collaboration with psychologists and sociologists to develop guidelines for investigating such scenarios demonstrates Microsoft’s commitment to addressing not only technical vulnerabilities but also psychosocial harm. It is reflected. This human-centered approach ensures that AI systems are evaluated not only for their functionality, but also for their ethical and emotional impact on users.
Addressing AI harm and security risks responsibly
The harmful nature of responsible AI (RAI), including the generation of biased, harmful, and offensive content, poses unique challenges. Unlike traditional security vulnerabilities, RAI harm is often subjective and context-dependent, requiring a customized approach for assessment and mitigation. Microsoft’s distinction between hostile and benign user scenarios is particularly insightful because it emphasizes the importance of designing systems that are resilient to unintended failures.
In addition to addressing model-specific risks, integrating generative AI into large-scale applications is introducing new attack vectors. For example, server-side request forgery (SSRF) vulnerabilities in video processing systems highlight the importance of protecting not only AI models but also the infrastructure around them.
Continuous improvement and collaboration
Microsoft’s whitepaper concludes with a call to action for the AI community. They emphasize that the work of securing AI systems is never finished, as new capabilities and risks continue to emerge. This requires a commitment to iterative “break-and-fix” cycles that continually identify and address vulnerabilities, and fostering collaboration across organizations and disciplines.
Furthermore, it is clear that a regulatory framework that balances innovation and responsibility is needed. By aligning technological advances with policy and economic incentives, the industry can create a more secure foundation for AI development.