It is not legal advice.
The world’s first comprehensive law on artificial intelligence, the EU AI Act, has officially entered into force, impacting the way AI is developed and used, including by open source communities. If you’re an open source developer working in this new environment, you’re probably wondering what this means for your project. This guide details key points of the regulation with a focus on open source development, provides a clear introduction to the law, and introduces tools to help you prepare to comply with the law.
Disclaimer: The information provided in this guide is for informational purposes only and should not be considered any form of legal advice.
TL;DR: AI laws may apply to open source AI systems and models, with specific rules depending on the type of model and how it is released. In most cases, obligations include providing clear documentation, adding tools to disclose model information during deployment, and following existing copyright and privacy regulations. Fortunately, many of these practices are already common in open source environments, and Hugging Face provides tools to help you prepare for compliance, including tools that support opt-out processes and editing personal data. . Check out model cards, dataset cards, Gradio watermarks, opt-out mechanisms and support for editing personal data, licensing, and more.
EU AI law is a binding regulation aimed at fostering responsible AI. To that end, we set rules based on the level of risk that an AI system or model may pose, while aiming to keep research open and support small and medium-sized enterprises (SMEs). As an open source developer, many aspects of your work are not directly affected, especially if you already have your systems documented and your data sources tracked. In general, there are simple steps you can take to prepare for compliance.
The regulation will come into force over the next two years and will apply widely, not just within the EU. Open source developers outside the EU are included in this law if their AI systems or models are provided to or impact people in the EU.
🤗 Range
This regulation works at different levels of the AI
Models: Only General Purpose AI (GPAI) models are directly regulated. GPAI models are trained on large amounts of data, are highly general, can perform a wide range of tasks, and can be used in systems and applications. One example is large-scale language models (LLMs). Modifications and tweaks to the model must also be subject to mandates. System: A system that can make inferences from input. This typically takes the form of a traditional software stack that leverages one or more AI models or connects to a digital representation of the input. One example is a chatbot that leverages LLM or Gradio apps hosted on Hugging Face Spaces to interact with end users.
AI law sets rules depending on the level of risk that an AI system or model may pose. All AI systems have potential risks, including:
Unacceptable: Systems that violate human rights, such as AI systems that collect facial images from the internet or surveillance camera footage. These systems are prohibited and cannot be placed on the market. High: Systems that can negatively impact people’s safety or fundamental rights, such as critical infrastructure, essential services, or law enforcement. These systems must undergo thorough compliance procedures before they can be brought to market. Limited: Systems that interact directly with people and can create risks of impersonation, manipulation, or deception. These systems must meet transparency requirements. Most generative AI models can be integrated into systems that fall into this category. As a model developer, your model is more likely to be easily integrated into an AI system if you have already met the requirements, such as by providing sufficient documentation. Minimal: Most systems – The above risks do not occur. You only need to comply with existing laws and regulations; the AI
General purpose AI (GPAI) models have another risk category called systemic risk. This is a GPAI model that uses significant computing power, today defined as more than 10^25 FLOPs for training, or has high-impact features. According to a study by Stanford University, as of August 2024, based on Epoch estimates, there are 8 models (Gemini 1.0 Ultra, Llama 3.1-405B, GPT-4, Mistral Large, Nemotron-4 340B, MegaScale, Inflection- 2, Inflection-) only. 2.5) Seven developers (Google, Meta, OpenAI, Mistral, NVIDIA, ByteDance, Inflection) meet the default system risk criteria of being trained using at least 10^25 FLOPs. Obligations vary depending on whether it is open source or not.
🤗 How to prepare for compliance
This short guide focuses on limited risk AI systems and open source non-systemic risk GPAI models. These models include most of those published on the hub. For other risk categories, please be sure to review any further obligations that may apply.
For AI systems with limited risk
Limited Risk AI systems interact directly with people (end users), which can create risks of impersonation, manipulation, or deception. Examples include tools that facilitate the creation of misinformation materials and deepfakes, such as text-generating chatbots and text-to-image generators. The AI
Developers of AI systems with limited risk should:
Disclose to users that they are interacting with an AI system unless it is obvious. Keep in mind that end users may not have the same technical understanding as experts, and this information must be provided in a clear and thorough manner. Mark synthetic content: AI-generated content (audio, images, video, text, etc.) must be clearly marked as artificially generated or manipulated in a machine-readable format. Existing tools, such as Gradio’s built-in watermarking capabilities, can help meet these requirements.
Note that you can not only be a developer but also an “adopter” of an AI system. Adopters of AI systems are individuals or companies that use AI systems in a professional capacity. In that case, you must also comply with the following:
For emotion recognition and biometric systems: Implementers must inform individuals about the use of these systems and process personal data in accordance with relevant regulations. Disclosure of deepfakes and AI-generated content: Adopters must disclose when AI-generated content is used. If your content is part of a work of art, you have an obligation to disclose that the generated or manipulated content exists in a way that does not detract from the experience.
The above information must be provided in clear language at the latest when the user interacts with or touches the AI
The AI
For open source non-systemic risk GPAI models
If you are developing an open source GPAI model (such as an LLM) that does not pose systemic risk, the following obligations apply: Open source in the AI
Non-systematic open source GPAI models have the following obligations:
Draft and make available a sufficiently detailed outline of the content that will be used to train the GPAI model, following the template provided by the AI
EU AI law also ties in with existing regulations on copyright and personal data, such as the Copyright Directive and the Data Protection Regulation. To this end, look to Hugging Face integrated tools that support better opt-out mechanisms and editing of personal data, and stay up to date with recommendations from European and national bodies such as the CNIL.
Hugging Face’s projects implement formats to understand and implement opt-outs for training data, including BigCode’s Am I In The Stack app and the integration of dataset spawn widgets and image URLs. These tools allow creators to easily opt out of allowing their copyrighted material to be used for AI training. These tools are very effective in addressing these decisions, as they have developed opt-out processes that allow creators to effectively notify the public that they do not want their content used for AI training.
Developers may rely on the Code of Practice (currently under development and expected by May 2025) to demonstrate compliance with these obligations.
Other obligations apply if you publish your work in a way that does not meet open source standards under the AI
Also note that if a particular GPAI model meets the conditions for posing a systemic risk, its developer must notify the EU Commission. During the notification process, developers can argue that their model does not present systemic risk because of certain characteristics. The committee considers each argument and accepts or rejects the claim depending on whether the argument is sufficiently substantiated, taking into account the specific characteristics and features of the model. If the European Commission rejects the developer’s claims, the GPAI model will be designated as posing a systemic risk and will have to comply with further obligations, including training and testing processes and the provision of technical documentation about the model, including evaluation results. .
The GPAI model mandate will come into force from August 2025.
🤗 Join us
Many of the practical applications of the EU AI Act are still being developed through public consultations and working groups, and the results will determine how the provisions of the Act are aimed at smoother compliance for SMEs and researchers. It will be decided whether it will be implemented. If you’re interested in seeing how this plays out, now’s a great time to get involved.
@misc{eu_ai_act_for_oss_developers, author = {Bruna Trevelin, Lucie-Aimée Kaffee, Yacine Jernite}, title = {Open Source Developer Guide to EU AI Law}, book title = {Hugging Face Blog}, year = {2024}, url = {}, doi = {} }
We would like to thank Anna Tordjmann, Brigitte Tousignant, Chun Te Lee, Irene Solaiman, Clémentine Fourrier, Ann Huang, Benjamin Burtenshaw, and Florent Daudens for their feedback, comments, and suggestions.