We’re excited to share the most impactful features since Argilla joined Hugging Face. You can prepare AI datasets with no code and start with any Hub dataset. Argilla’s UI allows you to easily import datasets from Hugging Face Hub, define your questions, and start collecting human feedback.
Not familiar with Argilla? Argilla is a free, open source, data-centric tool. Argilla allows AI developers and domain experts to collaborate and build high-quality datasets. Argilla is part of the Hugging Face family and is fully integrated with hubs. Want to know more? Here is an introductory blog post.
Why is this new feature important to you and the community?
The Hugging Face hub contains 230,000 datasets that you can use as the foundation for your AI projects. This simplifies gathering human feedback from the Hugging Face community or professional teams. Democratize dataset creation for users who have extensive knowledge of a specific domain but are not confident writing code.
use case
This new feature democratizes building high-quality datasets on the hub.
If you have an open dataset and want the community to contribute, please import it into your public Argilla Space and share the URL with the world. If you want to start annotating new datasets from scratch, upload your CSV to the hub, import it into Argilla Space, and start labeling. If you want to curate an existing Hub dataset to fine-tune or evaluate your model, import your dataset into Argilla Space and start curation. If you would like to improve an existing Hub dataset to benefit the community, please import it into Argilla Space and provide feedback.
structure
First, you need to deploy Argilla. The recommended method is to follow this guide to deploy to a space. The default deployment has Hugging Face OAuth enabled. This means the space will be open to annotations from any Hub user. OAuth is ideal for use cases when you want your community to contribute to your dataset. If you’d like to limit annotations to yourself and other collaborators, check out this guide for additional configuration options.
Import HF dataset from hub in Argilla UI
Once Argilla is running, sign in and click the (Import dataset from Hugging Face) button on the home page. You can start with one of the sample datasets, or enter the repository ID of the dataset you want to use.
In this first version, hub datasets must be public. If you’re interested in supporting private datasets, we’d love to hear from you on GitHub.
Argilla automatically suggests an initial configuration based on the characteristics of your dataset, so you don’t have to start from scratch, but you can add questions and remove unnecessary fields. The field must contain the data you want feedback on, such as text, chat, or images. Questions are the feedback you want to collect, such as labels, ratings, rankings, and text. All changes are displayed in real time, giving you a clear view of the Argilla datasets you are configuring.
Once you are satisfied with the results, click (Create Dataset) to import the dataset containing your configuration. You are now ready to submit your feedback.
You can try this out yourself by following our quickstart guide. It takes less than 5 minutes!
This new workflow streamlines importing datasets from the hub, but if you need further customization, you can still import datasets using Argilla’s Python SDK.
We’d love to hear your thoughts and first experiences. Let us know on GitHub or HF Discord.