Open Source Developer Guide to EU AI Law

It is not legal advice.

The world’s first comprehensive law on artificial intelligence, the EU AI Act, has officially entered into force, impacting the way AI is developed and used, including by open source communities. If you’re an open source developer working in this new environment, you’re probably wondering what this means for your project. This guide details key points of the regulation with a focus on open source development, provides a clear introduction to the law, and introduces tools to help you prepare to comply with the law.

Disclaimer: The information provided in this guide is for informational purposes only and should not be considered any form of legal advice.

TL;DR: AI laws may apply to open source AI systems and models, with specific rules depending on the type of model and how it is released. In most cases, obligations include providing clear documentation, adding tools to disclose model information during deployment, and following existing copyright and privacy regulations. Fortunately, many of these practices are already common in open source environments, and Hugging Face provides tools to help you prepare for compliance, including tools that support opt-out processes and editing personal data. . Check out model cards, dataset cards, Gradio watermarks, opt-out mechanisms and support for editing personal data, licensing, and more.

EU AI law is a binding regulation aimed at fostering responsible AI. To that end, we set rules based on the level of risk that an AI system or model may pose, while aiming to keep research open and support small and medium-sized enterprises (SMEs). As an open source developer, many aspects of your work are not directly affected, especially if you already have your systems documented and your data sources tracked. In general, there are simple steps you can take to prepare for compliance.

The regulation will come into force over the next two years and will apply widely, not just within the EU. Open source developers outside the EU are included in this law if their AI systems or models are provided to or impact people in the EU.

🤗 Range

This regulation works at different levels of the AI stack. This means that your regulatory obligations will be different if you are a provider (including developers), deployer, distributor, etc. and if you are working on an AI model or system.

Models: Only General Purpose AI (GPAI) models are directly regulated. GPAI models are trained on large amounts of data, are highly general, can perform a wide range of tasks, and can be used in systems and applications. One example is large-scale language models (LLMs). Modifications and tweaks to the model must also be subject to mandates. System: A system that can make inferences from input. This typically takes the form of a traditional software stack that leverages one or more AI models or connects to a digital representation of the input. One example is a chatbot that leverages LLM or Gradio apps hosted on Hugging Face Spaces to interact with end users.

AI law sets rules depending on the level of risk that an AI system or model may pose. All AI systems have potential risks, including:

Unacceptable: Systems that violate human rights, such as AI systems that collect facial images from the internet or surveillance camera footage. These systems are prohibited and cannot be placed on the market. High: Systems that can negatively impact people’s safety or fundamental rights, such as critical infrastructure, essential services, or law enforcement. These systems must undergo thorough compliance procedures before they can be brought to market. Limited: Systems that interact directly with people and can create risks of impersonation, manipulation, or deception. These systems must meet transparency requirements. Most generative AI models can be integrated into systems that fall into this category. As a model developer, your model is more likely to be easily integrated into an AI system if you have already met the requirements, such as by providing sufficient documentation. Minimal: Most systems – The above risks do not occur. You only need to comply with existing laws and regulations; the AI Act does not add any additional obligations.

General purpose AI (GPAI) models have another risk category called systemic risk. This is a GPAI model that uses significant computing power, today defined as more than 10^25 FLOPs for training, or has high-impact features. According to a study by Stanford University, as of August 2024, based on Epoch estimates, there are 8 models (Gemini 1.0 Ultra, Llama 3.1-405B, GPT-4, Mistral Large, Nemotron-4 340B, MegaScale, Inflection- 2, Inflection-) only. 2.5) Seven developers (Google, Meta, OpenAI, Mistral, NVIDIA, ByteDance, Inflection) meet the default system risk criteria of being trained using at least 10^25 FLOPs. Obligations vary depending on whether it is open source or not.

🤗 How to prepare for compliance

This short guide focuses on limited risk AI systems and open source non-systemic risk GPAI models. These models include most of those published on the hub. For other risk categories, please be sure to review any further obligations that may apply.

For AI systems with limited risk

Limited Risk AI systems interact directly with people (end users), which can create risks of impersonation, manipulation, or deception. Examples include tools that facilitate the creation of misinformation materials and deepfakes, such as text-generating chatbots and text-to-image generators. The AI Act aims to address these risks by allowing ordinary end users to understand that they are interacting with an AI system. Currently, most GPAI models are not considered to pose systemic risk. For limited risk AI systems, whether open source or not, the following obligations apply:

Developers of AI systems with limited risk should:

Disclose to users that they are interacting with an AI system unless it is obvious. Keep in mind that end users may not have the same technical understanding as experts, and this information must be provided in a clear and thorough manner. Mark synthetic content: AI-generated content (audio, images, video, text, etc.) must be clearly marked as artificially generated or manipulated in a machine-readable format. Existing tools, such as Gradio’s built-in watermarking capabilities, can help meet these requirements.

Note that you can not only be a developer but also an “adopter” of an AI system. Adopters of AI systems are individuals or companies that use AI systems in a professional capacity. In that case, you must also comply with the following:

For emotion recognition and biometric systems: Implementers must inform individuals about the use of these systems and process personal data in accordance with relevant regulations. Disclosure of deepfakes and AI-generated content: Adopters must disclose when AI-generated content is used. If your content is part of a work of art, you have an obligation to disclose that the generated or manipulated content exists in a way that does not detract from the experience.

The above information must be provided in clear language at the latest when the user interacts with or touches the AI system for the first time.

The AI Office, which is responsible for enforcing the AI Act, will help develop standards of practice that include guidelines for detecting and labeling artificially generated content. These codes are currently being developed with the participation of industry and civil society and are expected to be published by May 2025. The obligation will come into effect from August 2026.

For open source non-systemic risk GPAI models

If you are developing an open source GPAI model (such as an LLM) that does not pose systemic risk, the following obligations apply: Open source in the AI Act is defined as “software, including models, released under a free open source license that can be openly shared and that users can freely access, use, modify, and redistribute them or modified versions thereof.” means “data”. Developers can choose from a list of open licenses on the hub. Check whether the license you choose meets the AI Act’s definition of open source.

Non-systematic open source GPAI models have the following obligations:

Draft and make available a sufficiently detailed outline of the content that will be used to train the GPAI model, following the template provided by the AI Office. The level of detail of the content is still being discussed, but it should be relatively comprehensive. Implement policies that comply with EU law regarding copyright and related rights, and in particular opt-outs. Developers must ensure that they are authorized to use copyrighted material. Copyrighted material is available with the permission of the rights holder or subject to copyright exceptions and limitations. One of these exceptions is the Text and Data Mining (TDM) exception. This is a widely used technique for content acquisition and analysis in this context. However, the TDM exception generally does not apply if the rights holder specifically states that it reserves the right to use its copyrighted work for these purposes. This is called “opting out.” When establishing policies to comply with the EU Copyright Directive, you should respect these opt-outs and limit or prohibit the use of protected materials. In other words, training on copyrighted material is not illegal if you respect the author’s decision to opt out of AI training. There are still open questions about how opt-outs should technically be expressed, especially in machine-readable format, but respecting the information expressed in a website’s robots.txt file and using Spawning’s API, etc. Taking advantage of these tools is a good start.

EU AI law also ties in with existing regulations on copyright and personal data, such as the Copyright Directive and the Data Protection Regulation. To this end, look to Hugging Face integrated tools that support better opt-out mechanisms and editing of personal data, and stay up to date with recommendations from European and national bodies such as the CNIL.

Hugging Face’s projects implement formats to understand and implement opt-outs for training data, including BigCode’s Am I In The Stack app and the integration of dataset spawn widgets and image URLs. These tools allow creators to easily opt out of allowing their copyrighted material to be used for AI training. These tools are very effective in addressing these decisions, as they have developed opt-out processes that allow creators to effectively notify the public that they do not want their content used for AI training.

Developers may rely on the Code of Practice (currently under development and expected by May 2025) to demonstrate compliance with these obligations.

Other obligations apply if you publish your work in a way that does not meet open source standards under the AI Act.

Also note that if a particular GPAI model meets the conditions for posing a systemic risk, its developer must notify the EU Commission. During the notification process, developers can argue that their model does not present systemic risk because of certain characteristics. The committee considers each argument and accepts or rejects the claim depending on whether the argument is sufficiently substantiated, taking into account the specific characteristics and features of the model. If the European Commission rejects the developer’s claims, the GPAI model will be designated as posing a systemic risk and will have to comply with further obligations, including training and testing processes and the provision of technical documentation about the model, including evaluation results. .

The GPAI model mandate will come into force from August 2025.

🤗 Join us

Many of the practical applications of the EU AI Act are still being developed through public consultations and working groups, and the results will determine how the provisions of the Act are aimed at smoother compliance for SMEs and researchers. It will be decided whether it will be implemented. If you’re interested in seeing how this plays out, now’s a great time to get involved.

@misc{eu_ai_act_for_oss_developers, author = {Bruna Trevelin, Lucie-Aimée Kaffee, Yacine Jernite}, title = {Open Source Developer Guide to EU AI Law}, book title = {Hugging Face Blog}, year = {2024}, url = {}, doi = {} }

We would like to thank Anna Tordjmann, Brigitte Tousignant, Chun Te Lee, Irene Solaiman, Clémentine Fourrier, Ann Huang, Benjamin Burtenshaw, and Florent Daudens for their feedback, comments, and suggestions.

See Full Bio

What's Hot

kv cache from scratch in nanovlm

Workplace AI Series – Part 3: Artificial Intelligence in Employment: How States Around Pennsylvania Are Near Legal Situation | Tucker Aresberg, PC

AI-Media announces innovative AI voice translation at NAB Show 2025

kv cache from scratch in nanovlm

Gemini 2.5 native audio features

IBM and Roche use AI to predict blood glucose levels

New Star: Discover why 보니 is the future of AI art

How to use Olympic coders locally for coding

SmolVLM miniaturization – now available in 256M and 500M models!

Most Popular

New Star: Discover why 보니 is the future of AI art

How to use Olympic coders locally for coding

SmolVLM miniaturization – now available in 256M and 500M models!

Don't Miss

kv cache from scratch in nanovlm

Workplace AI Series – Part 3: Artificial Intelligence in Employment: How States Around Pennsylvania Are Near Legal Situation | Tucker Aresberg, PC

AI-Media announces innovative AI voice translation at NAB Show 2025

Subscribe to Updates

What's Hot

Open Source Developer Guide to EU AI Law

🤗 Range

🤗 How to prepare for compliance

For AI systems with limited risk

For open source non-systemic risk GPAI models

🤗 Join us

Related Posts