Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

Benchmarking large-scale language models for healthcare

June 8, 2025

Oracle plans to trade $400 billion Nvidia chips for AI facilities in Texas

June 8, 2025

Research papers provide a roadmap for AI advancements in Nigeria

June 7, 2025
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Monday, June 9
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
Versa AI hub
Home»Tools»Build a great dataset for video generation
Tools

Build a great dataset for video generation

By February 12, 2025Updated:February 13, 2025No Comments2 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
Share
Facebook Twitter LinkedIn Pinterest Email


Sayak Pole Avatar


The tool for image generation datasets is well established, and IMG2Dataset is a basic tool used to prepare large datasets, with various community guides, scripts and UIs covering small initiatives. It will be supplemented.

Our ambition is to create tools for equally established video generation datasets by creating small-scale, suitable open video dataset scripts and leveraging Video2Dataset for large-scale use cases .

“If I’ve seen more, it’s standing on the shoulders of a giant.”

In this post, we outline the tools we are developing to make it easier for the community to create their own datasets of fine-tuned video generation models. If you can’t wait to get started, check out this codebase.

table of contents

Toto

Touring

Video generation usually conditiones natural language text prompts such as “cat walking on grass, realistic style.” Secondly, there are many qualitative aspects of video for controllability and filtering.

Watermark movement Aesthetic existence NSFW content presence

Video generation models are as good as the data being trained. Therefore, these aspects become important when curating the dataset for training/fine tuning.

The 3-stage pipeline is inspired by works such as stable video spreading, LTX-Video, and its data pipeline.

Stage 1 (acquisition)

Like video2dataset, I choose to use yt-dlp to download videos.

Create scripted videos in your scene and split the long video into short clips.

Stage 2 (pre-processing/filtering)

Extracted frames

The whole video

Predict motion scores with OpenCV

Stage 3 (Processing)

Florence-2 with Microsoft/Florence-2-Large, Florence-2 tasks, and extracted frames. This provides a variety of captions, object recognition, and OCRs that can be used for filtering in different ways.

Other captions can be brought in in this regard. You can also caption the entire video (such as a model like QWEN2.5) in contrast to captioning for individual frames.

Filtering Examples

In the dataset of the model Finetrainers/Crush-Smol-V0, select the caption from QWEN2VL and then PWATERMARK <0.1およびAESTHETIC> Filtered at 5.5. This highly restrictive filtering caused 47 videos out of a total of 1,493.

Let’s take a look at an example frame in Pwatermark –

The two scores in the text are 0.69 and 0.61

Pwatermark Image 0.69
19S8CRUVF3E-SCENE-022_0.jpg

0.61
19S8CRUVF3E-SCENE-010_0.jpg

“Toy Car with a bundle of mice” gets 0.60 and 0.17 when the toy car is crushed.

Pwatermark Image 0.60
-ivrtqwaetm-scene-003_0.jpg

0.17
-ivrtqwaetm-scene-003_1.jpg

All sample frames were filtered by Pwatermark <0.1. Pwatermark is effective in detecting text/watermarks, but the score does not indicate whether it is a text overlay or a toy car license plate. Our filtering required that all scores be below the threshold. The average frame is a more effective strategy for Pwatermarks with a threshold of approximately 0.2-0.3.

Let’s take a look at an example frame for aesthetic scores –

Pink Castle initially scores 5.5 and 4.44 is crushed

Aesthetic Images 5.50
-ivrtqwaetm-scene-036_0.jpg

4.44
-ivrtqwaetm-scene-036_1.jpg

The action figure score drops at 4.99 and drops to 4.84 when crushed.

Aesthetic image 4.99
-ivrtqwaetm-scene-046_0.jpg

4.87
-ivrtqwaetm-scene-046_1.jpg

4.84
-ivrtqwaetm-scene-046_2.jpg

Glass shard scores a low score at 4.04

Aesthetic Image 4.04
19S8CRUVF3E-SCENE-015_1.jpg

Filtering required all scores to fall below the threshold. In this case, using the aesthetic score for the first frame is only a more effective strategy.

Reviewing Finetrainers/Crush-Smol, many of the objects being crushed are round or rectangular and colorful, which is similar to the findings of the example frame. Aesthetic scores are useful, but there are potential biases that exclude good data when used with extreme thresholds such as >5.5. It may be more effective as a filter for good content with a minimum threshold of about 4.25-4.5.

OCR/Caption

Here we provide a visual example of each filter and a Florence-2 caption.

Image Caption Detailed Caption

Rats and toy cars

A toy car with lots of rats inside. This image shows a blue toy car with three white mice sitting behind it, driving on a road with a green wall in the background. Comes with OCR labels with OCR and regional labels

OCR Label

OCR and Regional Labels

Use 👨‍🍳 to use this tool

Similar to the Pika effect, I tried to generate cool video effects and created various datasets using tools.

We then used these datasets to fine-tune the Cogvideox-5B model using Finetrainers. Below is an example of the output from Finetrainers/Crush-Smol-V0.

Prompt: diff_crush red candles are placed on the metal platform, with a large metal cylinder descending from above, flattening the candle as if it were under a hydraulic press. The candle is crushed into a flat, round shape, leaving a pile of debris around it.

Your turn

We hope that this tool provides a headstart for creating small, high quality video datasets for your own custom applications. Keep an eye out as we continue to add more useful filters to our repository. Your contributions are also welcome

Thank you Pedro Cuenca for providing extensive reviews on the post.

author avatar
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleAI was introduced in depository and participant regulations
Next Article Expand YouTube podcasts and enhance author tools with AI

Related Posts

Tools

Benchmarking large-scale language models for healthcare

June 8, 2025
Tools

Oracle plans to trade $400 billion Nvidia chips for AI facilities in Texas

June 8, 2025
Tools

The most comprehensive evaluation suite for GUI agents!

June 7, 2025
Add A Comment
Leave A Reply Cancel Reply

Top Posts

Deepseek’s latest AI model is a “big step back” for free speech

May 31, 20255 Views

Doudna Supercomputer to Strengthen AI and Genomics Research

May 30, 20255 Views

From California to Kentucky: Tracking the rise of state AI laws in 2025 | White & Case LLP

May 29, 20255 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

Deepseek’s latest AI model is a “big step back” for free speech

May 31, 20255 Views

Doudna Supercomputer to Strengthen AI and Genomics Research

May 30, 20255 Views

From California to Kentucky: Tracking the rise of state AI laws in 2025 | White & Case LLP

May 29, 20255 Views
Don't Miss

Benchmarking large-scale language models for healthcare

June 8, 2025

Oracle plans to trade $400 billion Nvidia chips for AI facilities in Texas

June 8, 2025

Research papers provide a roadmap for AI advancements in Nigeria

June 7, 2025
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2025 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?