AI Red Team’s false sense of security

Itamar Golan, CEO and co-founder of Propent Security, a core member of LLM-based app OWASP Top 10.

Getty

AI Red Teaming has emerged as a key security measure for AI-powered applications. It involves using hostile methods to proactively identify flaws or vulnerabilities such as harmful or biased output, unintended behavior, system limitations, or potential risk of misuse. Organizations use it to simulate attacks and reveal vulnerabilities in AI models, particularly large-scale language models (LLM).

However, there are dangerous misconceptions. The belief is that only the red team can secure an AI system. AI Red Teaming relies primarily on identifying and patching fixed vulnerabilities. This is a great starting point, but not enough.

Depending on the stages of the AI system’s lifecycle, we explore two different types of red teaming efforts, each with its own scope and responsibility. Model Red Team and Application Red Team.

Model Red Team

This form of red team is usually the responsibility of the model provider (e.g. Openai, anthropology, etc.). It focuses on identifying risks inherent in LLM itself. This includes testing of unsafe output, jailbreaking likelihood, discriminatory response, and overall alignment with safety goals.

Model providers often use both internal and external experts to make this red team on a large scale. A good example is Openai’s publication on Red Teaming Approach. This explains how to engage with a diverse group of external red teams alongside the internal team to rigorously evaluate the model.

Application Red Team

Once the model is integrated into real applications, responsibility shifts to an organization that deploys application builders, typically AI-powered solutions. This layer of red teams focuses not only on model behavior, but on how the entire application handles user interactions, data flows and prompts.

For example, imagine a travel agency deploying a chatbot with AI to allow users to book a trip. The system can connect to one or more LLMSs, vector databases, your own data sources, and APIs. The Red team in this application includes evaluating system prompts (persistent instructions that steering the behavior of the model), how user input is handled, and how the model interfaces with external tools or services. Beyond the raw capabilities of the model, it uncovers vulnerabilities in application logic, rapid engineering, data exposure, or potential exploit vectors.

The fundamental difference between AI security and traditional software security

Traditional software security followed a structured approach. Identify vulnerabilities through penetration testing, patch them and proceed. This worked because the software flaws were deterministic. Once they were fixed, they remained fixed.

In contrast, LLM behaves probabilistically and generates responses based on context and training data, making them inherently unpredictable. Even if the red team finds vulnerabilities today, the model may still behave predictively tomorrow. It’s valuable, but there are significant limitations on the red team.

The risks of AI, such as rapid injection, cannot be patched like traditional bugs. These attacks allow prompts to manipulate and trigger harmful outputs, taking advantage of the stochastic nature of LLMS. Even after mitigation, slight input changes can bypass the defense. Attackers adapt quickly, checking prompts and exploiting behaviors that they have not seen before. Red Teaming only provides point-in-time snapshots.

LLMS and Genai applications behave like a living system. They are constantly evolving. The Openai model I used yesterday may be another version tomorrow with slightly changed behavior. This means that previously blocked simulation attacks due to model updates will be successful.

The need for runtime AI security

To truly protect AI applications, organizations need to implement runtime protection, a security solution that works in real time to detect and mitigate threats that occur. Find out how runtime security works across a wide range of AI-powered applications.

Chatbots and AI Assistants

AI chatbots are used throughout the industry, from customer support to healthcare. They face threats such as rapid injection, context hijacking, and response operations. Chatbots could pass the red team, but production is still vulnerable.

Runtime protection can detect suspicious inputs, filter harmful outputs in real time, and block data leaks and rogue actions.

AI-equipped code generation tool

AI code assistants such as Github Copilot and Cursor Boost Developer productivity can also introduce security risks by generating insecure code. The red team helps you find some issues, but you can’t catch them all.

Runtime Protection can analyze AI-generated code in real time to detect flaws, force policies to block unstable proposals, learn from new threats and move ahead of evolving attack technologies.

AI Agent for Agent Workflow

AI agents automate workflows based on user instructions, but can manipulate unintended actions. The red team may reveal such risks, but static fixes are often lacking.

Runtime protection helps by monitoring agent behavior for dangerous actions, dynamically filtering and blocking harmful commands, and adapting policies in real time to prevent exploitation using threat intelligence.

Mainstream challenges in AI runtime protection

Adopting AI runtime protection introduces several challenges. Cost is the main concern. Not just for technology, but for the resources needed to implement and maintain. Performance trade-offs such as latency and false positives raise concerns about user experience and operational reliability. Runtime tools add complexity to a busy technology stack and increase the burden on engineering and security teams.

The deeper challenge lies in the changing model of responsibility. Unlike traditional software, AI systems blur the line between developers, model providers, infrastructure teams and security owners. It is often unclear who will be responsible for ensuring AI operation at runtime. This ambiguity can complicate implementations and slow decision-making.

As AI systems are embedded in critical workflows, solving these questions becomes essential, but many organizations still struggle to put AI security where it fits the structure and process.

Change your thinking about AI security

AI security requires real-time threat detection, adaptive defense, and full-stack runtime protection to combat evolving threats. AI models work probabilistically, making it difficult to fix vulnerabilities permanently.

Relying solely on the red team can create a false sense of security. It helps to identify some risks, but does not take into account the unpredictability of AI or the persistence of attack variants. As a starting point, teams need to guide frameworks like LLMS’ OWASP Top 10 and NIST AI Risk Management Framework to an effective, continuous protection strategy.

Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Are you qualified?

versatileai

See Full Bio

What's Hot

Creating innovative content at your fingertips

The UK and Singapore form an alliance to guide AI into finance

StarCoder2 and Stack V2

AI-powered security: Enhance endpoints in a changing corporate environment

Cycraft launches Xecguard:LLM Firewall for trustworthy AI

AI Data Security: 83% compliance gap facing pharmaceutical companies

New Star: Discover why 보니 is the future of AI art

Impact International | EU AI ACT Enforcement: Business Transparency and Human Rights Impact in 2025

Presight plans to expand its AI business internationally

Most Popular