Direct integration with embracing face

Prodigy is an annotation tool created by Explosion, a company well known as the creator of Spacy. It is a completely scriptable product and there is a large community around it. This product has many features, including tight integration with Spacy and active learning capabilities. However, the main feature of the product is that it is programmatically customizable in Python.

To promote this customizability, Explosion has begun releasing plugins. These plugins integrate with third-party tools in an open way that encourages users to tackle their bespoke annotation workflow. However, one customization deserves explicit praise, especially. Last week, Explosion introduced Prodigy-HF. Prodigy-HF provides code recipes that integrate directly with the face stack to hug. It was a highly requested feature in the Prodigy Support Forum, so I’m very excited to get it there.

Features

The first main feature is that this plugin allows you to train and reuse face models with annotated data. This means that if you annotate data in the interface for named entity recognition, you can directly tweak it to the BERT model.

What does the Prodigy Ner interface look like?

After installing the plugin, you can directly train the trans model with your own data by invoking the hf.train.ner recipe from the command line.

Python -M Prodigy HF.Train.ner Fashion Train, Rating: Fashion Ebel Pass//Model Out – Model “Distilbert-Base-Uncased”

This will fine-tune the Distilbert-Base Uncased model of the dataset saved in the genius and save it to disk. Similarly, the plugin also supports a model of text classification through a very similar interface.

Python -M Prodigy HF.Train.TextCat Fashion Train, Rating: Fashion Ebel Pass//Model Out – Model “Distilbert-Base-Uncased”

This tool offers a lot of flexibility as it integrates directly with the autotoken equipment and automation classes of the Hug face transformer. Transformers on the hub can be fine-tuned with your own dataset with a single command. These models are serialized on disk. This means you can upload them to the facehub of your hug or reuse them to annotate your data. This saves a lot of time, especially on NER tasks. To reuse a trained NER model, you can use the hf.correct.ner recipe.

Python -M Prodigy HF.Correct.ner Fashion Train Pass/to/Model-Out Examples.jsonl

This gives you the same interface as before, but also shows the model predictions on the interface.

Upload

An equally exciting second feature is the ability to expose annotated datasets to a hugged facehub. This is great if you’re interested in sharing the dataset that others want to use.

Python -M Prodigy HF.upload /

I especially like this upload feature because it encourages collaboration. People can annotate their own datasets independently of one another, but they can benefit from sharing their data with the wider community.

More coming

We hope that this direct integration and embracing facial ecosystem will allow many users to do more experiments. The embracing facehub offers many models for a wide range of tasks and a wide range of languages. I really hope that this integration will make the data easier to annotate, even when there are more domain-specific experimental use cases.

Other features of this library are currently in progress. If you have any further questions, please feel free to contact us via the Genius Forum.

I would also like to thank the team hugging Face for their feedback on this plugin, especially the one that suggested adding upload functionality, and especially the @davantrien. thank you!

versatileai

See Full Bio

What's Hot

One year since “Deep Seek Moment”

The most cost-effective AI model ever

Google’s industrial robot AI Play makes physical AI a priority

One year since “Deep Seek Moment”

The most cost-effective AI model ever

Google’s industrial robot AI Play makes physical AI a priority

Open Source DeepResearch – Unlocking Search Agents

Improving the accuracy of multimodal search and visual document retrieval using the Llama Nemotron RAG model

Google’s industrial robot AI Play makes physical AI a priority

Most Popular

Open Source DeepResearch – Unlocking Search Agents

Improving the accuracy of multimodal search and visual document retrieval using the Llama Nemotron RAG model

Google’s industrial robot AI Play makes physical AI a priority

Don't Miss

One year since “Deep Seek Moment”

The most cost-effective AI model ever

Google’s industrial robot AI Play makes physical AI a priority

Subscribe to Updates

What's Hot

Direct integration with embracing face

Features

Upload

More coming

Related Posts