Building trust in an open model ecosystem through standardized risk assessments
Over 500,000 models can be found on the face hub of the hug, but it is not always clear for users how to choose the best model, especially in terms of security. Developers may find models that fit perfectly with their use cases, but there is no systematic way to assess security attitudes, privacy implications, or potential failure modes.
As models become more powerful and adoption accelerate, equally rapid advances in AI safety and security reporting are needed. Therefore, we are pleased to announce RiskRubric.ai, a novel initiative led by the Cloud Security Alliance and NOMA Security, with the contributions of Haize Labs and Harmonic Security, for a standardized, transparent risk assessment in the AI model ecosystem.
Risk rubric, a new standardized assessment of risk in the model
RiskRubric.ai provides consistent and comparable risk scores across the model’s landscape by assessing six pillars of models: transparency, reliability, security, privacy, safety and reputation.
The platform’s approach is perfectly consistent with rigorous, transparent, and reproducible open source values. When you use NOMA security features to automate your efforts, each model receives:
Checks for over 1,000 reliability tests Automatic code scan of 200+ adversarial security probe model components for consistency and jailbreak and rapid injection Comprehensive document review Training data and methods Privacy assessment review Structured harmful content testing
These ratings generate 0-100 scores for each risk pillar, rolling up to clear AF letter grades. Each assessment also includes specific vulnerabilities found, recommended mitigations and suggestions for improvement.
RiskRubric also comes with filters to help developers and organizations make deployment decisions based on what is important to them. Do you need a model with a strong privacy guarantee for your healthcare applications? Filter by privacy score. Do you want to build a customer-facing application that requires consistent output? We prioritize reliability assessment.
What we found (as of September 2025)
Evaluating both open and closed models on the exact same criteria highlighted some interesting results. Many open models actually outweigh the counterparts that are closed at a specific dimension of risk (particularly the transparency in which open development practices shine).
Let’s take a look at the general trends:
The risk distribution is polarized – most models are strong, but the stage score indicates an increase in exposure

The total risk score ranges from 47 to 94 with a median of 81 (100 points). Most models are concentrated in the “safer” range (54% is A or B level), but the lower-performing tail drags the average. The splitting indicates polarization. Models tend to be well protected or in the mid-score range, with fewer in between.
The model, concentrated on 50-67 bands (C/D range), is not completely broken, but only offers overall protection. This band represents the most practical area of concern that security gaps are important enough to ensure prioritization.
What this means: Don’t assume that the “average” model is safe. The tail of the weak performer is real. And that’s where attackers concentrate. Teams can use composite scores to set minimum thresholds (75) for procurement or deployment, ensuring outliers do not fall into production.
Safety risk is the “swing factor”, but closely tracks its security attitude

Safety and social pillars (e.g., harmful output prevention) show the widest variation among models. Importantly, models investing in security hardening (fast injection defense, policy enforcement) almost always score better safety.
What this means: Enhanced core security controls go beyond preventing jailbreak, but can also directly reduce downstream harm! Safety appears to be a by-product of a robust security attitude.
GuardRails can erode transparency – unless you design for it
With stricter protections, models often reduce transparency to end users (e.g., unexplained refusal, hidden boundaries). This can create a trust gap. Users may consider the system “opaque” even if it is safe.
This means: security should not come at the expense of trust. To balance both, combine strong safeguards with explanatory refusals, source signals, and auditability. This keeps transparency in place without loosening your defenses.
You can access the update result sheet here
Conclusion
If risk assessments are published and standardized, the entire community can work together to improve the safety of the model. Developers can see exactly where the model needs to be enhanced, and the community can contribute to fixes, patches, and safer, fine-tuned variants. This creates a positive cycle of clear improvements that are not possible in a closed system. Additionally, by studying the best models, the community can help them understand the overall safety is not wise.
If you would like to participate in this initiative, you can submit a model for evaluation (or suggest an existing model!) and understand the risk profile.
We also welcome all feedback on evaluation methods and scoring frameworks

