Partnering in October 2024, we will enhance machine learning (ML) model security through Guardian’s scanning technology for the developer community who will hang and protect AI and explore and use models of embracing facehubs. The partnership was naturally fit from the beginning. Hugging your face is on the mission of democratizing the use of open source AI with a commitment to safety and security. AI is about building guardrails and making open source models safe for everyone.
Four new threat detection modules have been launched
Since October, AI Protect has significantly expanded Guardian’s detection capabilities, improved existing threat detection capabilities, and launched four new detection modules.
PAIT-ARV-100: archive slip can write to file system to file system on load pait-joblib-101: suspicious code execution of Joblib model detected during model load pait-tf-200: tensorflow saved model architectural backdoor pite-lmafl-300: llamafile can execute incorrect code.
With these updates, Guardian will cover more model file formats and detect additional sophisticated obfuscation techniques, including Keras’ high severity CVE-2025-1550 vulnerability. Through the enhanced detection tool, embracing face users will receive critical security information via inline alerts on the platform and access comprehensive vulnerability reports on Insight DB. Clearly labeled findings are available on each model page, allowing users to make more informed decisions about the models they integrate into their projects.
Figure 1: Protect AI inline alerts on hugging faces
By numbers
As of April 1, 2025, Protect AI had successfully scanned 4.47 million unique model versions in the 1.41 million repository of Hug’s facehub.
To date, AI Protect has identified a total of 352,000 unsafe/suspecting issues with 51,700 models. Over the past 30 days, Protect AI has provided 226 million requests from hugging their faces with a response time of 7.94ms.
Protect AI Parents apply a zero trust approach to AI/ML security. This works especially well when treating arbitrary code execution as inherently unsafe, regardless of intent. Guardian Flag doesn’t simply classify malicious threats, but flags them as suspicious in InsightSDB, realizing that even harmful code can appear harmless through obfuscation techniques (see more about payload obfuscation below). Attackers can impersonate payloads within seemingly benign scripts or extensible components of frameworks, making payload inspections insufficient to ensure security. By maintaining this careful approach, Guardians can help reduce the risk poses to hidden threats in machine learning models.
The threats of AI/ML security are evolving every day. That’s why AI Protect AI leverages both the in-house threat research team and the hunting team. It is the world’s first largest AI/ML bug bounty program, powered by a community of over 17,000 security researchers.
In line with the launch of the partnership in October, Protect AI launched a new program for Huntr and crowdsourced research into new model files vulnerabilities. Since the programme began, they have received over 200 reports protecting AI teams and have been incorporated into the Guardian. All of these are automatically applied to model scans in a facial hug.
Figure 2: Hunt’s Bug Bounty Program
General Attack Theme
As more hunting reports come and more independent threat research is being conducted, certain trends are emerging.
Library-dependent attack chains: These attacks focus on the ability of bad actors to call features from libraries that reside in ML workstation environments. These are reminiscent of “drive-by download” style attacks that have plagued browsers and systems when common utilities such as Java and Flash are present. Typically, the magnitude of the impact of these attacks is proportional to the widespread use of a particular library. Common ML libraries like Pytorch have a much broader potential impact than unused libraries.
Payload Obfuscation: Some reports highlight how to insert, obfuscate, or “hide” payloads into models that bypass common scanning techniques. These vulnerabilities are not easily detected using techniques such as compression, encoding, and serialization to obfuscate the payload. Compression is a problem as libraries like Joblib allow direct loading of compressed payloads. Container formats like Keras and Nemo have embedded additional model files. Each is potentially vulnerable to a specific attack vector. Compression can expose users to Tarslip or Zipslip vulnerabilities. Although these effects are often limited to denial of service, in certain circumstances, these vulnerabilities can leverage past traversal techniques to lead to arbitrary code execution, allowing malicious attackers to overwrite files that are often automatically executed.
Framework extensibility vulnerabilities: The ML framework provides numerous scalability mechanisms that create inadvertently dangerous attack vectors for custom layers, external code dependencies, and configuration-based code loading. For example, Keras’ CVE-2025-1550 has been reported to us by Huntr Community and shows how to exploit custom layers to execute arbitrary code despite security features. Configuration files with serialization vulnerabilities similarly allow dynamic code loading. These decolorization vulnerabilities exploit the model through created payloads built in a format that users unquestionably load. Despite improved security from vendors, handling of old vulnerable versions and unstable dependencies continues to pose serious risks to the ML ecosystem.
Attack Vector Chain: Recent reports show how multiple vulnerabilities can be combined to create sophisticated attack chains that can bypass detection. By sequentially exploiting vulnerabilities such as obfuscated payloads and extension mechanisms, researchers have shown complex paths for compromises that appear to be benign when examined individually. This approach significantly complicates detection and mitigation efforts, as security tools focusing on single vector threats often miss these combined attacks. Effective defense requires identifying and addressing all links in the attack chain, rather than dealing with each vulnerability alone.
Huntr Community-backed, industry-leading AI threat research teams are continually gathering data and insights to develop new, more robust model scans and automated threat blocking (available to Guardian customers). Over the past few months, the Guardians are:
Enhanced detection of library-dependent attacks: A significant expansion of Guardian’s scanning capabilities to detect library-dependent attack vectors. Pytorch and Pickle scanners perform deep structural analysis of serialized code, examine execution paths, and identify potentially malicious code patterns that can be triggered through library dependencies. For example, the pytorch torchvision.io function can overwrite files on the victim’s system to include the payload or remove all content. Guardian can now further detect many of these dangerous features in popular libraries such as Pytorch, Numpy, and Pandas.
Unexpressed Obfuscated Attack: Guardian performs multi-layered analysis in various archive formats, decompresses nested archives, and examines the compressed payload of malicious models. This approach detects attempts to hide malicious code via compression, encoding, or serialization techniques. For example, Joblib supports model storage using a variety of compression formats that can confuse pickle’s unintervention vulnerabilities, and other formats like Keras can include Numpy Weights files with biased payloads.
Exploits detected in the framework’s extensibility component: Guardian’s detection module has constantly improved to users about hugging their faces on models affected by CVE-2025-1550 (Critical Security Certification) before the vulnerability was published. These detection modules comprehensively analyze the ML framework extension mechanisms, allowing only standard or validated components, blocking potentially dangerous implementations, regardless of their obvious intent.
Additional Architectural Backdoor Identified: Guardian’s architectural backdoor detection capabilities have been extended beyond the ONNX format, including additional model formats such as TensorFlow.
Enhanced Model Format Coverage: The true strength of the Guardian comes from the depth of coverage. This will result in significant extensions to the detection module incorporating additional formats such as Joblib and the increasingly popular Llamafile format, with support for additional ML frameworks.
Provides deeper model analysis: Proactively research on additional methods to enhance current sensing capabilities for better analysis and detection of unsafe models. Hopefully in the near future we will see significant enhancements in reducing both false positives and false negatives.
Through our partnership with Protect AI and Hugging Face, we have made our third-party ML models safer and more accessible. I believe it’s a good thing to look more at the security of the model. More and more, seeing the security world leaning with attention, making it easier to spot threats and using AI is becoming more secure for everyone.