When I first launched Worker AI, I bet that the AI model would be faster and smaller. We build our infrastructure around this hypothesis, add special GPUs to data centers around the world, and provide inferences to our users as quickly as possible. We created the platform as generic as possible, but also identified niche use cases suitable for our infrastructure, such as low-latency image generation and real-time audio/voice agents. To rely on these use cases, we offer several new models that will help to facilitate the development of these applications.
We look forward to today to announce that we are expanding our model catalog to include closed source partner models that fit this use case. We partner with Leonardo.ai and Deepgram to provide the latest and largest models for worker AI hosted on CloudFlare’s infrastructure. Both Leonardo and Deepgram have models with excellent speed and performance ratios that fit the infrastructure of workers AI. We are starting with these great partners, but we look forward to expanding our catalogue to other partner models as well.
The advantage of using these models with worker AI is that it not only has a standalone inference service, but also has a suite of developer products that allow you to build your entire application around AI. When building an image generation platform, workers can use their workers to host application logic, worker AI to generate images, R2 for storage, and images to deliver and convert media. If you are building a real-time voice agent, we provide turn detection models via worker, speech-to-text, text-to-speech, and worker AI, as well as WeBRTC and WebSocket support via orchestration layers via CloudFlare real-time. Overall, I want to lean towards a use case that I consider CloudFlare to have unique benefits. We want to back up and make all of our developer tools available to you, allowing you to build the best AI applications on top of the Holistic Developer platform.
Leonardo.ai is a generic AI media lab that trains its own models and hosts a platform for customers to create generated media. The Worker AI team has been working with Leonardo for a while and has experienced firsthand the magic of their image-generating models. We look forward to introducing two Leonardo image generation models: @CF/Leonardo/Phoenix-1.0 and @CF/Leonardo/Lucid-Origin.
“We look forward to enabling CloudFlare customers to extend and use image generation technology in creative ways, including creating character images for the game, generating personalized images for their websites, and generating character images through Workers AI and the CloudFlare developer platform.” -Peter Runham, CTO, Leonardo.ai
The Phoenix model is trained from scratch by Leonardo and is excellent at rendering text and quick consistency. The complete image generation request was end-to-end with 25 steps, 1024×1024 images, 4.89 seconds.
Curl -Request Post \ -url https://api.cloudflare.com/client/v4/accounts/ {account_id }/ai/run/@cf/leonardo/phoenix-1.0 \ —header ‘certification: Bearer {token}’ “1950s style neon diner sign sign reads “\” “opens 24 hours”\'” with chrome detailing and vintage typography.”, “width”, “width”: 1024, “height”: 1024, “step”: 25, “seed”: 1, “guidance”: 4, “negative” Unclear, noisy, grainy, supersaturated, overcharged “}”
The Lucid Origin Model is a recent addition to Leonardo’s model family and is ideal for generating photorealistic images. The image took 4.38 seconds and generated end-to-end and 1024×1024 image sizes in 25 steps.
Curl -Request Post \ -url https://api.cloudflare.com/client/v4/accounts/ {account_id }/ai/run/@cf/leonardo/lucid-origin \ —header ‘authorization: Bearer {token}’ “1950s style neon diner sign sign reads “\” “opens 24 hours”\'” with chrome detailing and vintage typography.”, “width”; “width”: 1024, “height”: 1024, “step”: 25, “seed”: 1, “guidance”: 4, “negative” Unclear, noisy, grainy, supersaturated, overcharged “}”
Deepgram is a Voice AI company that develops its own audio models, allowing users to interact with AI through Voice, the natural human interface. Voice is an exciting interface as it has higher bandwidth than text, as it has other audio signals such as pacing, intonation and more. The Deepgram model provided on the platform is an audio model that performs extremely fast speech-to-text speech inference. Combined with worker AI infrastructure, the model introduces our own infrastructure, allowing our customers to build low-latency voice agents and more.
“Hosting voice models with CloudFlare’s Worker AI allows developers to create real-time, expressive voice agents with ultra-low latency. CloudFlare’s global network brings AI computing closer to users everywhere. -Adam Syniewski, CTO, Deepgram
@cf/deepgram/nova-3 is a speech-to-text model that allows for fast transfer of audio with high accuracy. @cf/deepgram/aura-1 is a text-to-speak speech model that recognizes context and allows you to apply natural pacing and expressiveness based on input text. The new Aura 2 models will soon be available for Worker AI. Also, the experience of sending Binary MP3 files to Workers AI has improved, so you don’t need to convert them to UINT8 arrays like before. In addition to real-time announcements (coming soon!), these audio models are key to enabling customers to build voice agents directly in CloudFlare.
With AI binding, the Nova 3 voice-to-text model call would look like this:
const url = “https://www.some-website.com/audio.mp3”; const mp3 = await fetch(url); const res = await env.ai.run( “@cf/deepgram/nova-3”, {“audio”: {body:mp3.body, contentType: “audio/mpeg”}, “detect_language”: true});
In the REST API:
curl -request post \ -url ‘https://api.cloudflare.com/client/v4/accounts/ {account_id }/ai/run/@cf/deepgram/nova-3?detect_language=true’ \ – header ‘wutherize \ -data-binary @/path/to/audio.mp3
Similarly, we added WebSocket support to our Deepgram model. This can be used to keep the connection to the inference server live and for use for bidirectional input and output. To use the NOVA model with WebSocket support, see the developer documentation.
All the pieces work together and allow them to do.
Capture audio in real time from Webrtc sources
Pipe to the processing pipeline via WebSocket
Deepgram AI running on workers transcribed in audio ML models
Workers Process using selected LLMs hosted on Workers AI or proximated models via AI gateways
Adjust everything with a real-time agent
Try these models today
See the developer documentation. Learn more, pricing, and how to get started with the latest partner models available with Workers AI.