Recently, DevOps professionals have been reminded that the software supply chain is full of risks, and what I like to say is, it’s a raging dumpster fire. Unfortunately, this risk now includes open source artificial intelligence (AI) software. This incident in particular came after further investigation into Hugging Face (think GitHub for AI models and training data) revealed up to 100 potentially malicious models residing on that platform. It’s a reality check about the ever-present vulnerabilities that can so easily catch out a development team without one. While working to obtain a machine learning (ML) or AI model, dataset, or demo application, they were surprised by something surprising.
Hugface’s vulnerability is not isolated. PyTorch, another open source ML library developed by Facebook’s AI Research lab (FAIR), is widely used for deep learning applications and provides a flexible platform for building, training, and deploying neural networks. I will. PyTorch is built on the Torch library and provides strong support for tensor computation and GPU acceleration to highly efficiently perform complex mathematical operations frequently required in ML tasks.
However, recent security breaches have raised specific concerns about blindly trusting open source sites’ AI models due to concerns that the content may have been previously contaminated by malicious actors. Masu.
While this concern is valid, it stands in stark contrast to long-held beliefs in the benefits of open source platforms, such as fostering community through collaboration on projects and nurturing and promoting the ideas of others. is. The benefit of building a secure community around large-scale language models (LLMs) and AI is that malicious attackers can infiltrate your supply chain, corrupt your CI/CD pipelines, and prevent you from believing they are trustworthy at first. It seems to evaporate when the possibility of changing components that were previously used increases. sauce.
Chief Product Officer at Exabeam.
Software security evolves from DevOps to LLMOps
LLM and AI are expanding supply chain security concerns for organizations, especially as interest in incorporating LLM into product portfolios grows across a variety of sectors. Cybersecurity leaders whose organizations are considering adapting to the widespread availability of AI applications are aware of the risks posed by suppliers, not just for traditional DevSecOps, but now also for ML operations (MLOps) and LLM operations (LLMOps). You need to stand firmly against it.
CISOs and security professionals must be proactive in detecting malicious datasets and quickly responding to potential supply chain attacks. To do that, you need to be aware of what these threats are.
Overview of LLM-specific vulnerabilities
The Open Worldwide Application Security Project (OWASP) is a nonprofit foundation dedicated to improving software security through community-driven open source projects such as code, documentation, and standards. It is a truly global community of over 200,000 users from around the world in over 250 local chapters, delivering industry-leading education and training conferences.
This community effort created the OWASP Top 10 Vulnerabilities for LLM. As one of their creators, I know how these vulnerabilities differ from traditional application vulnerabilities and why they are important in the context of AI development.
LLM-specific vulnerabilities may initially appear isolated, but as more organizations integrate AI into their development and operational processes, they have the potential to have far-reaching implications for the software supply chain. there is. For example, a prompt injection vulnerability could allow an attacker to manipulate LLM through crafted input. If not properly mitigated, this type of vulnerability can lead to output corruption and the propagation of incorrect or insecure code through connected systems, impacting downstream supply chain components.
Other security threats are caused by LLM’s tendency to hallucinate, causing the model to generate inaccurate or misleading information. This could introduce vulnerabilities in code trusted by downstream developers or partners. Malicious actors can exploit illusions to introduce insecure code, leading to new types of supply chain attacks that propagate through trusted systems. Additionally, if these vulnerabilities are discovered after deployment, they can pose serious reputational and legal risks.
Additionally, there are vulnerabilities in unsafe output handling and challenges in distinguishing between intended and unsafe inputs to the LLM. An attacker could manipulate the inputs to the LLM and produce harmful outputs that could pass through automated systems unnoticed. Without proper filtering and output validation, malicious attackers can compromise all stages of the software development lifecycle. Implementing a zero trust approach is important to filter data both from the LLM to the user and from the LLM to the backend systems. This approach includes using tools such as the OpenAI Moderation API to ensure more secure filtering.
Finally, when it comes to training data, this information can be compromised in two ways. Label poisoning refers to labeling data inaccurately to cause harmful reactions. or training data compromise. It affects the model’s decisions by contaminating some of the training data and distorting its decisions. Data poisoning suggests that a malicious attacker may actively try to poison a model, but the potential for this to occur inadvertently, especially for training datasets extracted from public internet sources. There are also enough.
In some cases, a model can “know too much” by regurgitating the information it was trained with or accessed. For example, in December 2023, researchers at Stanford University added “child sexual abuse material” to a highly popular dataset (LAION-5B) used to train image generation algorithms such as stable diffusion. It showed that it contains over 3,000 relevant images. This example has left developers in the AI image generation field scrambling to determine whether their models used this training data and what impact it might have on their applications. Ta. If the development team for a particular application had not carefully documented the training data used, they would not have known whether their model was at risk of producing immoral and potentially illegal images. Sho.
To mitigate these threats, developers can incorporate security measures into the AI development lifecycle to create more robust and secure applications. To do this, you can implement a secure process for building LLM apps. This is determined in 5 simple steps.
1) Selection of basic model. 2) Data preparation. 3) Verification. 4) Deployment. 5) Monitoring.
To increase the security of LLM, developers can leverage cryptographic techniques such as digital signatures. Digitally signing a model using a private key creates a unique identifier that can be verified using the corresponding public key. This process ensures the authenticity and integrity of the model and prevents unauthorized modification or tampering. Digital signatures are particularly valuable in supply chain environments where models are distributed or deployed through cloud services because they provide a way to authenticate models as they move between different systems.
Watermarking is another effective technique to protect your LLM. Watermarks create a unique fingerprint that traces a model back to its origins by embedding a subtle, imperceptible identifier within the model’s parameters. Even if the model is cloned or stolen, the watermark remains embedded so it can be detected and identified. While digital signatures are primarily focused on preventing unauthorized modification, watermarks serve as a permanent marker of ownership and provide an additional layer of protection against unauthorized use and distribution.
Model cards and software bills of materials (SBOMs) are also tools designed to increase transparency and understanding of complex software systems containing AI models. An SBOM is essentially a detailed inventory of all software product components, with an emphasis on listing and detailing every piece of third-party and open source software included in a software product. SBOM is important for understanding software configuration, especially for tracking vulnerabilities, licenses, and dependencies. Please note that AI-specific versions are currently in development.
The key innovation in CycloneDX 1.5 is ML-BOM (Machine Learning BOM), which is a game-changer for ML applications. This feature enables a comprehensive list of ML models, algorithms, datasets, training pipelines, and frameworks in SBOM, with important details such as model provenance, version control, dependencies, and performance metrics. Acquire reproducibility, governance, risk assessment, and compliance of ML systems.
This advancement is significant for ML applications. ML-BOM provides clear visibility into the components and processes involved in ML development and deployment, helping stakeholders understand the configuration of ML systems, identify potential risks, and consider ethical implications. Masu. The Security domain enables you to identify and remediate vulnerabilities in ML components and dependencies. This is essential for conducting security audits and risk assessments, and greatly contributes to the development of secure and reliable ML systems. It also supports compliance and regulatory requirements such as GDPR and CCPA by ensuring transparency and governance of ML systems.
Finally, extend DevSecOps to LLMOps, including model selection, scrubbing training data, securing pipelines, automating ML-BOM builds, building AI red teams, and properly monitoring and logging your systems with helpful tools. It is essential to use a strategy that All of these proposals focus on providing appropriate guardrails for secure LLM development while maintaining a secure foundation of Zero Trust and providing inspiration for what is possible with AI. ation and wide range of imagination.
We’ve featured the best network monitoring tools.
This article is produced as part of TechRadarPro’s Expert Insights channel, featuring some of the brightest minds in technology today. The views expressed here are those of the author and not necessarily those of TechRadarPro or Future plc. If you are interested in contributing, learn more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro