HiddenLayer researchers quickly surface technology that bypasses all AI guardrails

HiddenLayer revealed this week that researchers discovered rapid injection techniques to bypass instructional hierarchies and safety guardrails in all the major basic artificial intelligence (AI) models offered by OpenAI, Google, Humanity, Meta, Deepseek, Mistral and Alibaba.

CEO Chris Sestito said HiddenLayer researchers were able to employ a combination of internally developed policy techniques and role-play to generate output that violates policies related to chemistry, biology, radiation, nuclear research, mass violence, self-harm and system prompt leaks.

Specifically, HiddenLayer reports that previously disclosed policy puppet attacks can be used to reformulate the prompts to look like one of several types of policy files, such as XML, INI, or JSON. This approach allows cybercriminals to bypass system prompts and model-trained safety alignments.

Disclosure of these AI vulnerabilities coincides with an update to the HiddenLayer platform to protect AI models that can also be used to create bills of AI materials (AIBOMs) in addition to providing the ability to track the genealogy of models.

Additionally, version 2.0 of its AISEC platform allows it to consolidate data from public sources such as embracing faces, allowing it to surface more practical intelligence on emerging machine learning security risks.

Finally, AISEC Platform 2.0 also provides access to updated dashboards that allow for deeper runtime analysis by providing greater visibility into rapid injection attempts, misuse patterns, and agent behavior.

Soon, HiddenLayer is also working on adding support for AI agents built on top of the AI model.
In general, it is clear that AI model providers are much more focused on performance and accuracy than on security, Sestito said. The AI model is inherently vulnerable, he added, despite the guardrails that may have been introduced.

If AI agents are allowed to access large-scale data, applications, and services, that problem will become even more problematic, Sestito noted. These AI agents are, in fact, a new type of identity that cybercriminals will undoubtedly find ways to compromise, he added.

But despite these concerns, organizations continue to deploy AI technology to ensure their cybersecurity teams are ultimately asked to ensure, Sestito said.

AI is not the first emerging technology that cybersecurity teams have been asked to secure after they have already been adopted, but the level of potential damage that could be inflicted by violations of AI models or agents can be devastating. There is greater awareness about this issue today than last year’s period, but it has become clear that there is a lot to be clear about the need to secure AI technology.

Of course, there is a limited number of cybersecurity experts with AI expertise, and a limited number of AI experts with even fewer cybersecurity concerns. Thus, it may not be as much a matter of whether there are major AI security incidents as possible as what harm is done before more attention is paid.

versatileai

See Full Bio

What's Hot

AI Art Generation Using Primo Models: Unlock Creative Business Opportunities in 2024 | AI News Details

Benchmarks for speech models from wild text

Creating innovative content at your fingertips

In the midst of intense AI talent races, Meta’s active recruitment target open-rai researcher

Lossless compression tailored to AI

High-tech research jobs in the US will rise by 26% by the next decade. Median future salary for AI, ML and others is $140,000

New Star: Discover why 보니 is the future of AI art

Impact International | EU AI ACT Enforcement: Business Transparency and Human Rights Impact in 2025

Presight plans to expand its AI business internationally

Most Popular