Any custom frontend with Gradio’s backend

A few weeks ago, we wrote about a one-shot full web app using gr.HTML. That means building rich, interactive front ends entirely within Gradio using custom HTML, CSS, and JavaScript. It unlocked a lot. But what if that’s not enough?

What if you want to benefit from Gradio’s queue system, API infrastructure, MCP support, and ZeroGPU on Spaces, but build entirely using your own front-end frameworks like React, Svelte, or even plain HTML/JS?

That’s exactly the problem that gradio.Server solves. And that changes what you can do with Gradio and Hugging Face Spaces.

what we wanted to create

Text Behind Image: An editor that uses ML models to remove the background when you upload a photo and place stylized text between the foreground subject and the background. The text appears behind the person or object in the image.

This requires:

Drag-and-drop canvas with layered rendering (background → text → foreground) Rich control panel with sliders for font size, weight, spacing, color, opacity, stroke, shadow, 3D extrusion, perspective transformation, etc. Backend ML endpoint that runs the background removal model and returns a transparent PNG Client-side export to PNG

There is no way to represent this UI with Gradio components. It’s a complete web application. However, we still wanted Gradio’s backend features: queuing, concurrency management, ZeroGPU support, and the ability to host in HF Spaces without infrastructure issues.

Please enter gradio.Server

gradio.Server extends FastAPI. It provides the full power of FastAPI (custom routes, middleware, file uploads, all response types) while adding Gradio’s API engine (queuing, SSE streaming, concurrency control, gradio_client compatibility).

The entire backend for Text Behind Image is as follows:

import OS
import torch
from pill import image
from torch vision import convert
from transformer import AutoModelForImageSegmentation
from gladio import server
from gradio.data_classes import file data
from fastapi.responses import HTML response
import Space torch.set_float32_matmul_precision(“expensive”) birefnet = AutoModelForImageSegmentation.from_pretrained(
“ZhengPeng7/BiRefNet”trust_remote_code=truth
) birefnet.to(“Cuda”) Birefnet.float()transform_image =transforms.Compose((transforms.Resize((1024, 1024)), transforms.ToTensor(), transforms.Normalize((0.485, 0.456, 0.406),(0.229, 0.224, 0.225)),)) App = Server()

@spaces.GPU
surely segment(Image: image.image) -> Image.Image:
“””Perform BiRefNet segmentation and generate a transparency mask.”””
image_size = image.size input_images =transform_image(image).unsqueeze(0). To (“Cuda”)
and torch.no_grad(): preds = birefnet(input_images)(-1).sigmoid().cpu() pred = preds(0).squeeze() mask = transform.ToPILImage()(pred).resize(image_size) image.putalpha(mask)
return image

@app.api(Name =“Remove background”)
surely background removal(image_path: file data) -> File data:
“””Removes the background from the image. Returns a transparent PNG.”””
im = image;open(image path(“path”)). Convert (“RGB”) result = segment(im) out_path = image_path(“path”).rsplit(“.”, 1)(0) + “.png”
result.save(out_path)
return File data (path=out_path)

@app.get(“https://huggingface.co/”response_class=HTML response)
asynchronous surely Home page(): html_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), “Index.html”)
and open(html_path, “r”encoding =“utf-8”) as f:
return f.read() app.launch(show_error=truth)

that’s it. Approximately 50 lines of Python. The model is loaded at startup, @spaces.GPU handles ZeroGPU allocation, and gradio.Server manages queuing and concurrency. Let’s analyze what’s happening.

Why use @app.api() instead of the plain FastAPI route?

If this is a regular FastAPI app, define an @app.post() route to remove the background. It works until two users press it at the same time. Without concurrency management, both requests would compete for the GPU, causing the app to crash or return garbage.

@app.api() solves this. Wrap the function in Gradio’s queuing engine. Requests are serialized, concurrency is controlled, and ZeroGPU Spaces automatically handles GPU allocation through @spaces.GPU. As a bonus, any @app.api() endpoint can also be called via gradio_client, allowing other apps and scripts to use Space programmatically.

from gradient client import client, handle_file client = Client(“ysharma/text behind image”) result = client.predict( image_path=handle_file(“Photo.jpg”), api_name=“/remove_background”
)

On the other hand, @app.get(“https://huggingface.co/”) is the standard FastAPI route to serve an HTML page. Server is a FastAPI app, so both naturally coexist.

Frontend: pure HTML/CSS/JS

In this example, Index.html is a self-contained web application of approximately 1300 lines. No React, no build steps, no bundlers. Just vanilla HTML like this:

3-layer canvas: background image → text layer → foreground stacked with CSS z-index (transparent PNG) Drag-and-drop text positioning using pointer events Control panel with 20+ parameters: font family (25+ fonts), size, weight, spacing, color, opacity, background fill, stroke, shadow, 3D extrusion depth and angle, rotation, skew, and full CSS Client-side PNG export compositing using perspective transformation

The front end uses the Gradio JS client to communicate with the back end.

import { clienthandle file } from “https://cdn.jsdelivr.net/npm/@gradio/client/dist/index.min.js”;

constant Client = wait client.connect(window.position.origin);
constant Result = wait client.predict(“/remove_background”,{
image path: handle file(file), }); foreground layer;Source = result.data(0).URL;

This is the important part. By using the Gradio JS client rather than a raw fetch() call, the front end goes through Gradio’s queue. This means concurrency is managed, GPU requests don’t collide, and queue position and progress can also be displayed to the user. All other text rendering, layer compositing, and exporting happens in the browser.

What does this unlock?

Here’s what wasn’t possible before gradio.Server:

Before After Custom UI means leaving Gradio completely Custom UI using Gradio’s backend engine There is no way to serve static HTML from a Gradio app @app.get(“https://huggingface.co/”) will serve anything gradio_client only works in Gradio component apps @app.api() endpoints are client compatible Gradio Choose between infrastructure and design freedom. You get both.

With gradio.Server, Gradio also acts as a backend framework, using its UI system when you need it, or using your own frontend when you don’t.

If you want Gradio’s UI, you can use gr.Blocks, gr.Interface, gr.ChatInterface. If you want your own UI, use gradio.Server and use your favorite frontend. Either way you get Spaces hosting, API queuing, gradio_client access, the full HF ecosystem, and more.

try out

The app is published in the space: ysharma/text-behind-image

Upload a photo with a clear subject and add text behind it. Experiment with 3D extrusion, perspective skew, and stroke effects. These go together well.

what’s next

This post explained the core ideas. gradio.Server allows you to pair any frontend with Gradio’s backend. There’s more to explore, including MCP tool registration with @app.mcp.tool(), SSE streaming for real-time updates, batch processing, and patterns for building multi-page apps with shared state.

I will discuss these in detail in my next post. stay tuned.

What's Hot

Why Five Eyes spy agencies warn they will be hit by AI cyber threats this year

OCR parameters for 50 languages from 1.5 million to 34.5 million

e2e-assure introduces Cumulo, the UK’s only sovereign AI-driven zero-day SOC platform for securing IT and OT environments

Why Five Eyes spy agencies warn they will be hit by AI cyber threats this year

OCR parameters for 50 languages from 1.5 million to 34.5 million

e2e-assure introduces Cumulo, the UK’s only sovereign AI-driven zero-day SOC platform for securing IT and OT environments

Can research agents keep secrets?

Computer vision helps retailers improve productivity

Model Development Loop Evaluation Workbench

Most Popular

Can research agents keep secrets?

Computer vision helps retailers improve productivity

Model Development Loop Evaluation Workbench

Don't Miss

Why Five Eyes spy agencies warn they will be hit by AI cyber threats this year

OCR parameters for 50 languages from 1.5 million to 34.5 million

e2e-assure introduces Cumulo, the UK’s only sovereign AI-driven zero-day SOC platform for securing IT and OT environments

Subscribe to Updates

What's Hot

Any custom frontend with Gradio’s backend

what we wanted to create

Please enter gradio.Server

Why use @app.api() instead of the plain FastAPI route?

Frontend: pure HTML/CSS/JS

What does this unlock?

try out

what’s next

Recommended reading

Related Posts