TL;DR: Hugging Face’s official MCP server provides unique customization options for AI assistants accessing thousands of AI applications via one simple URL as well as for accessing the hub. Use the MCPS “Streamable HTTP” transport for deployment and explore the trade-offs that server developers have in detail.
Last month, I learned a lot about building useful MCP servers. Here we will explain the journey.
introduction
The Model Context Protocol (MCP) fulfills its promise to be the standard that connects AI assistants to the outside world.
With a hugging face, providing access to the hub via the MCP is an obvious choice. This article shares my experience in developing hf.co/MCP MCP servers.
Design selection
The community uses hubs for research, development, content creation, and more. We wanted to allow people to customize their servers to suit their needs and make it easy to access thousands of AI applications available in the space. This meant making the MCP server dynamic by adjusting the user’s tools on the fly.
Also, we wanted to simplify access by avoiding complex downloads and configurations, so it was essential to have remote access via a simple URL.
Remote Server
When building a remote MCP server, the first decision is to decide how the client connects. MCP offers several transportation options with various trade-offs. TL;DR: Open source code supports all variations, but we chose to use the most modern variants for production. This section provides a detailed explanation of the various options.
Since its launch in November 2024, MCP has undergone rapid evolution in nine months with three protocol revisions. This replaced SSE Transport’s streamable HTTP, which saw the introduction and rework of approvals.
These rapid changes mean that there is a variety of support for different MCP features and revisions in client applications, providing additional challenges for design choices.
Here is a brief summary of the transport options provided by the Model Context Protocol and the associated SDK:
A transport note for stdio that is normally used when the MCP server is running on the same computer as the client. You can access local resources such as files if necessary. HTTP using SSE used for remote connections over HTTP. It was discontinued in the 2025-03-26 version of MCP, but it is still in use. More flexible remote HTTP transport offering more deployment options than streamable HTTP outgoing SSE versions
Both STDIO and HTTP with SSE are completely bidirectional by default. This means that clients and servers can maintain open connections and send messages to each other at any time.
SSE refers to a “server send event.” This is how the HTTP server maintains open connections and sends events on request.
Understanding streamable http
MCP server developers face many options when setting up streamable HTTP transports.
There are three main communication patterns:
Direct response – Simple request/response (such as the standard REST API). This is perfect for simple stateless operations like simple search. Request Scope Stream – A temporary SSE stream associated with a single request. This can help you send progress updates when tool calls take a long time, such as video generation. Additionally, the server may need to request information from the triggering user or perform a sampling request. Server Push Stream – Long-lived SSE connections that support server start messages. This allows for change notification or ad hoc sampling and withdrawal of the list of resources, tools, and prompts. These connections require additional management such as keepalives and resume mechanics upon reconnection.
If you use request scope streams in the official SDK, send the message to the correct stream by using the sendnotification() and sendRequest() methods provided in the requesthandlerextra parameter (typeScript), or by setting the associated _request_id (python).
An additional factor to consider is whether the MCP server itself needs to maintain the state of each connection. This is determined by the server when the client sends an initialization request.
Stateless Stateful Session ID Servers that do not need to respond with an MCP session ID. What that means is that each request is independent and maintains client context Simple horizontal scaling: Any instance can handle any request needs session affinity or restart of a shared state mechanism.
The table below summarizes the features of MCP and supported communication patterns.
MCP Function Server Push Request Scope Direct Response Tool, Prompt, Resource YYY Sampling/Elicitation Server was started anytime in relation to client-initiated request n resource subscription ynn tool/prompt list change ynn tool progress notification-yn
Using a request scope stream, sampling and triggering requests require a stateful connection so that the MCP session ID can be used for response-related purposes.
The Hugging Face MCP Server is open source and supports STDIO, SSE, and streamable HTTP deployments in both direct response and server push mode. When using a server push stream, you can configure Keep-Alive and the last activity timeout. There is also a built-in observability dashboard that can be used by various clients to understand how to manage connections and handle tool list change notifications.
The following image shows the MCP server connection dashboard running in “server push” streamable HTTP mode.

Production development
For production, I have decided to start an MCP server with streamable HTTP with a stateless, direct response configuration for the following reasons:
It provides a standard set of tools for using hubs along with stateless image generators for anonymous users. For authenticated users, we consist of selected tools and selected gradient applications. Also, make sure that the user Zerogpu quota is applied correctly to your account. This is managed using the supplied HF_TOKEN or OAUTH credentials that search on request. None of the existing tools need to maintain other states between requests.
You can add a login to the MCP server URL to use the OAuth login. If Claude.ai Remote Integration supports the latest OAuth specifications, this can be defaulted.
Direct responses provide the lowest deployment resource overhead – currently no tools require sampling or induction during execution.
The “HTTP with SSE” transport, which is future support at launch, was the remote default for many MCP clients. But due to the pressing condemnation, we didn’t want to invest heavily in managing it. Luckily, popular clients have already begun creating switches (VSCODE and cursors), and support was added within a week of the launch of Claude.AI. If you need to connect to SSE, feel free to deploy a copy of the server to Freecpu Hugging Face Space.
Tool list change notification
In the future, we want to support changes in the real-time tool list when users update their settings in the hub. However, this raises some practical questions.
First, users tend to configure their favorite MCP servers on their clients and leave them enabled. This means that the client remains connected while the application is open. Sending notifications means maintaining as many open connections as currently active clients, regardless of active use, if the user is likely to update the tool configuration.
Second, most MCP servers and clients will be disconnected after a period of inactivity and resume when necessary. This inevitably means that immediate push notifications will be missed because the notification channel is closed. In fact, it’s much easier for clients to update their connections and tools lists as needed.
Unless you have reasonably controlled client/server pairs, using server push streams adds a lot of complexity to your public deployments if there is a low resource solution to update your tool list.
URL User Experience
Just before launching, @Julien-C submitted a PR to include friendly instructions for users accessing hf.co/mcp. This greatly improves the user experience. The default response is otherwise unfriendly bits in JSON.
Initially, we found that this would generate a huge amount of traffic. After a bit of research I found out that when I returned a web page instead of an HTTP 405 error, VSCODE votes for the endpoint several times every second.
The fix suggested by @coyotte508 was to properly detect the browser and return the page only in that situation. Also, thank you to the VSCode team for making quick fixes.
Although not specifically stated, it appears that returning pages in this way is acceptable within the MCP specification.
MCP Client Behavior
The MCP protocol sends several requests during initialization. Typical connection sequences are initialization, notifications/initialization, tools/lists, and prompts/lists.
Given the fact that the MCP client connects and reconnects while it is open, and the user makes regular calls, we can see that each tool call has a ratio of about 100 MCP control messages.
Some clients send non-meaning requests for stateless direct response configurations, such as pings, cancellations, or attempts to list resources (not a feature currently advertised).
In the first week of July 2025, an astonishing 164 different clients visited the server. Interestingly, one of the most popular tools is MCP Remote. Approximately half of all clients use it as a bridge to connect to a remote server.
Conclusion
MCP is evolving rapidly and we are excited that it has already been achieved over the last few months with chat applications, IDEs, agents and MCP servers.
You can already see how powerfully integrated the face hub of your Hug is, and support for gradient space makes LLM easy to scale with modern machine learning applications.
Here are some great examples of what people have done with our MCP servers so far.
I hope this post provides insight into the decisions you need to build a remote MCP server and encourages you to try some of your favorite MCP client examples.
Look at an open source MCP server and try some of the different transport options with the client, open up issues, make improvements, or suggest new features.
Please let us know your thoughts, feedback or questions about this discussion thread.