Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

Why Five Eyes spy agencies warn they will be hit by AI cyber threats this year

June 23, 2026

OCR parameters for 50 languages ​​from 1.5 million to 34.5 million

June 23, 2026

e2e-assure introduces Cumulo, the UK’s only sovereign AI-driven zero-day SOC platform for securing IT and OT environments

June 22, 2026
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Wednesday, June 24
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources
Versa AI hub
Home»Tools»Brings serverless GPU reasoning to hug face users
Tools

Brings serverless GPU reasoning to hug face users

versatileaiBy versatileaiJune 22, 2025No Comments3 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
#image_title
Share
Facebook Twitter LinkedIn Pinterest Email

Update (November 2024): The integration is no longer available. Switch to the inference API, inference endpoint, or other deployment options for hugging faces, depending on your AI model’s needs.

Today, we are excited to announce the launch of Deploy for CloudFlare Workers AI, a new integration for Hugging Face Hub. Deployed to CloudFlare Workers AI easily uses the open model as a serverless API with cutting-edge GPUs deployed in CloudFlare Edge data centers. Starting today, we’ve integrated some of the most popular open models that will hug your face to CloudFlare Worker AI with production solutions, including text generation inference.

Deploying CloudFlare Worker AI allows developers to build robust, generated AI applications without managing GPU infrastructure and servers, with very low operating costs.

Generated AI for developers

This new experience expands the strategic partnership that was announced last year to simplify access and deployment of open-generated AI models. One of the main issues faced by developers and organizations is the scarcity of GPU availability and the fixed cost of deploying servers to start buildings. CloudFlare Worker Deployment AI offers easy, low-cost solutions to these challenges, providing serverless access to popular embrace face models.

Let’s take a look at a concrete example. Imagine developing a RAG application that gets ~1000 requests per day and developing a 1K token input and a 100 token output using Meta Llama 2 7b. The production cost of LLM inference is approximately $1 per day.

“We look forward to achieving this integration very quickly. By putting the power of CloudFlare’s global network of serverless GPUs into the hands of developers, we are opening the door to many exciting innovations by communities around the world.”

How it works

It’s very easy to use embracing face models with CloudFlare Worker AI. Below are step-by-step instructions on how to use the Hermes 2 Pro with the latest model from Nous Research, the Mistral 7b.

You can find all the models available in this CloudFlare collection.

Note: You will need to access your CloudFlare account and API tokens.

You can find CloudFlare deployment options on all available model pages, including models such as Llama, Gemma, Mistral, and more.

Model Card

Open the Deployment menu and select CloudFlare Workers AI. This opens an interface on how to use this model and how to send requests.

Note: If the model you are using does not have the “CloudFlare Workers AI” option, it is currently not supported. We are working with CloudFlare to increase the availability of our models. Please contact us using your request at api-enterprise@huggingface.co.

Inference snippet

Currently, integration is available through two options. It can be used directly by workers using the Worker AI REST API or using the CloudFlare AI SDK. Select the option you want and copy the code to your environment. When using the REST API, you must ensure that the Account_ID and API_TOKEN variables are defined.

that’s it! You can now begin sending requests to hug face models hosted by CloudFlare Worker AI. Make sure to use the correct prompts and templates that your model expects.

I’ve just started

We are excited to work with CloudFlare to make AI more accessible for developers. Work with the CloudFlare team to make more models and experiences available!

author avatar
versatileai
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleSix Secrets of a Superworker Company – Josh Bershin
Next Article Microbets on trend AI for cybersecurity companies offset US tariff mishaps
versatileai

Related Posts

Tools

Why Five Eyes spy agencies warn they will be hit by AI cyber threats this year

June 23, 2026
Tools

OCR parameters for 50 languages ​​from 1.5 million to 34.5 million

June 23, 2026
Tools

e2e-assure introduces Cumulo, the UK’s only sovereign AI-driven zero-day SOC platform for securing IT and OT environments

June 22, 2026
Add A Comment

Comments are closed.

Top Posts

Five ways AI can shape the future of online entertainment

September 17, 20254 Views

Currently, there are the most AI-dependent countries in the world Technology | Work

September 13, 20254 Views

What it means for Tech Stocks and AI Playbook 2025

September 4, 20254 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

Five ways AI can shape the future of online entertainment

September 17, 20254 Views

Currently, there are the most AI-dependent countries in the world Technology | Work

September 13, 20254 Views

What it means for Tech Stocks and AI Playbook 2025

September 4, 20254 Views
Don't Miss

Why Five Eyes spy agencies warn they will be hit by AI cyber threats this year

June 23, 2026

OCR parameters for 50 languages ​​from 1.5 million to 34.5 million

June 23, 2026

e2e-assure introduces Cumulo, the UK’s only sovereign AI-driven zero-day SOC platform for securing IT and OT environments

June 22, 2026
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2026 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?