Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

IBM and Roche use AI to predict blood glucose levels

June 3, 2025

Rubrik expands AI Ready Cloud Security’s AMD partnership to reduce costs by 10%

June 3, 2025

JMU Education Professor was awarded for AI Research

June 3, 2025
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Tuesday, June 3
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
Versa AI hub
Home»Tools»Introducing SQL consoles in your dataset
Tools

Introducing SQL consoles in your dataset

By March 3, 2025No Comments5 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
Share
Facebook Twitter LinkedIn Pinterest Email

Dataset usage is exploding, with faces becoming the default home for many datasets. As the amount of datasets uploaded to the hub increases each month, you will need to query, filter and discover them.


A data set created by hugging facehubs every month

I’m very excited to announce that I can directly execute SQL queries on my dataset with my embracing facehub!

Introducing the SQL Console for Datasets

All datasets will display a new SQL console badge. With just one click, you can open the SQL console and query that dataset.

Query Magpie-Ultra datasets for excellent, high-quality inference instructions.

All work is done in the browser and the console comes with some neat features.

100% local: The SQL console features DuckDB WASM, allowing you to query datasets without dependencies. Full DuckDB Syntax: DuckDB has full SQL syntax support and many built-in functions such as Regex, List, JSON, Embeddings. You can see that the DuckDB syntax is very similar to PostgreSQL. Export Results: You can export the results of a query to Parquet. Shareable: Allows you to share query results for public datasets with links.

How it works

Conversion of parquet

Most embracing face datasets are stored in Parquet, a cylindrical data format optimized for performance and storage efficiency. The embracing face and the SQL console dataset viewer loads data directly from the Parquet file of the dataset. Also, if the dataset is in a different format, the first 5GB will be automatically converted to parquet. You can find more information about Parquet Conversion Process in the Dataset Viewer Parquet API documentation.

Using the Parquet file, the SQL console creates views for querying based on dataset splitting and configuration.

duckdb wasm🦆

duckdb wasm is the engine that drives the SQL console. This is an in-process database engine that runs in a web assembly in a browser. No servers or backends are required.

By running it in a browser only, users provide maximum flexibility to query data without dependencies. It’s also very easy to share reproducible results with simple links.

“Does it work for large datasets?” The answer is “Yes!”.

This is a query for the OpenCo7/upvoteweb dataset with 12.6m rows in the Parquet transform.

Reddit movie suggestions

You can see that you received the results of a simple filter query in less than 3 seconds.

Queries take time based on the size of the dataset and the complexity of the query, but you’ll be surprised at how much you can do with the SQL console.

Like other technologies, there are limitations.

The SQL console works with many queries. However, the memory limit is ~3GB, so you may run out of memory and cannot process the query (hint: try to use a filter to reduce the amount of queries along with the limit along with the query). duckdb wasm is very powerful, but duckdb does not have full functionality. For example, duckdb wasm does not yet support hf:// protocols in datasets.

Example: Convert a dataset from Alpaca to a conversation

Now that we have introduced the SQL console, let’s look at some practical examples. When tweaking large language models (LLM), you often need to work with a variety of data formats. One particularly popular format is the conversational format where each line represents a multi-turn dialog between the user and the model. The SQL console helps you efficiently convert your data to this format. Let’s see how to convert an Alpaca dataset into a conversational format using SQL.

Typically, developers work on this task in Python’s preprocessing steps, but they can show you how to accomplish the same thing in less than 30 seconds using the SQL console.

In the above dataset, click on the SQL Console badge to open the SQL Console. You need to make sure that the following queries are automatically entered:

When you’re ready, click the (Run Query) button to run the query.

SQL

and
source_view As (
Select * from train )
Select
(struct_pack(“from”:= ‘user’“value” := case
when input teeth do not have null and input ! = ”
after that instruction || ‘\ n \ n’ || input
Other than that instruction
end
), struct_pack(“from”:= ‘assistant’“value” := output)) As conversation
from source_view
where instruction teeth do not have null
and output teeth do not have null;

In the query, you use the struct_pack function to create a new struct line for each conversation.

DuckDB has great documentation on struct data types and functions. Many datasets contain columns of JSON data. DuckDB provides the ability to easily parse and query these columns.

Alpaca to conversation

Once you have the results, you can download it as a parquet commemorative file. You can see what the final output below looks like:

Please give it a try!

As another example, you can try the SQL console query in SkunkWorksAI/Reasoning-0.01 to see instructions for over 10 inference steps.

SQL Snippets

DuckDB still has many use cases under investigation. I created an SQL snippet space to show you what you can do with the SQL console.

Here are some really interesting use cases we’ve found:

Remember, it’s one click to download the results of SQL as a donation file and use them in your dataset.

I’d like to hear what you think about SQL consoles. If you have any feedback, please comment on this post!

resource

author avatar
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleHow a small team of liberal arts alumni at Deepseek transforms AI text generation in China
Next Article AI-Media, ECB enhances accessibility with live captions for cricket matches

Related Posts

Tools

IBM and Roche use AI to predict blood glucose levels

June 3, 2025
Tools

Jacks of all trades, some masters, multipurpose trans agent

June 3, 2025
Tools

Address bias and ensure compliance with AI systems

June 2, 2025
Add A Comment
Leave A Reply Cancel Reply

Top Posts

How to use Olympic coders locally for coding

March 21, 20253 Views

SmolVLM miniaturization – now available in 256M and 500M models!

January 23, 20253 Views

Introducing walletry.ai – The future of crypto wallets

March 18, 20252 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

How to use Olympic coders locally for coding

March 21, 20253 Views

SmolVLM miniaturization – now available in 256M and 500M models!

January 23, 20253 Views

Introducing walletry.ai – The future of crypto wallets

March 18, 20252 Views
Don't Miss

IBM and Roche use AI to predict blood glucose levels

June 3, 2025

Rubrik expands AI Ready Cloud Security’s AMD partnership to reduce costs by 10%

June 3, 2025

JMU Education Professor was awarded for AI Research

June 3, 2025
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2025 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?