HiddenLayer revealed this week that researchers discovered rapid injection techniques to bypass instructional hierarchies and safety guardrails in all the major basic artificial intelligence (AI) models offered by OpenAI, Google, Humanity, Meta, Deepseek, Mistral and Alibaba.
CEO Chris Sestito said HiddenLayer researchers were able to employ a combination of internally developed policy techniques and role-play to generate output that violates policies related to chemistry, biology, radiation, nuclear research, mass violence, self-harm and system prompt leaks.
Specifically, HiddenLayer reports that previously disclosed policy puppet attacks can be used to reformulate the prompts to look like one of several types of policy files, such as XML, INI, or JSON. This approach allows cybercriminals to bypass system prompts and model-trained safety alignments.
Disclosure of these AI vulnerabilities coincides with an update to the HiddenLayer platform to protect AI models that can also be used to create bills of AI materials (AIBOMs) in addition to providing the ability to track the genealogy of models.
Additionally, version 2.0 of its AISEC platform allows it to consolidate data from public sources such as embracing faces, allowing it to surface more practical intelligence on emerging machine learning security risks.
Finally, AISEC Platform 2.0 also provides access to updated dashboards that allow for deeper runtime analysis by providing greater visibility into rapid injection attempts, misuse patterns, and agent behavior.
Soon, HiddenLayer is also working on adding support for AI agents built on top of the AI model.
In general, it is clear that AI model providers are much more focused on performance and accuracy than on security, Sestito said. The AI model is inherently vulnerable, he added, despite the guardrails that may have been introduced.
If AI agents are allowed to access large-scale data, applications, and services, that problem will become even more problematic, Sestito noted. These AI agents are, in fact, a new type of identity that cybercriminals will undoubtedly find ways to compromise, he added.
But despite these concerns, organizations continue to deploy AI technology to ensure their cybersecurity teams are ultimately asked to ensure, Sestito said.
AI is not the first emerging technology that cybersecurity teams have been asked to secure after they have already been adopted, but the level of potential damage that could be inflicted by violations of AI models or agents can be devastating. There is greater awareness about this issue today than last year’s period, but it has become clear that there is a lot to be clear about the need to secure AI technology.
Of course, there is a limited number of cybersecurity experts with AI expertise, and a limited number of AI experts with even fewer cybersecurity concerns. Thus, it may not be as much a matter of whether there are major AI security incidents as possible as what harm is done before more attention is paid.