FriendliAI’s inference infrastructure is integrated into Hugging Face Hub as an option on the “Deploy this model” button to simplify and speed up the delivery of generative AI models.
Collaborate to drive AI innovation
Hugging Face enables developers, researchers, and businesses to innovate with AI. Our shared priority is to build impactful partnerships that simplify workflows and provide cutting-edge tools to the AI community.
Today, we are excited to announce a partnership between HF and FriendliAI, a leader in accelerating generative AI inference, to enhance the way developers deploy and manage AI models. This integration introduces the FriendliAI endpoint as a deployment option within Hugging Face Hub, giving developers direct access to high-performance, cost-effective inference infrastructure.
FriendliAI features breakthrough technologies such as continuous batch processing, native quantization, and best-in-class autoscaling, and is rated by Artificial Analysis as the fastest GPU-based generative AI inference provider. With this technology, FriendliAI continues to advance the standard in AI inference processing performance, delivering faster processing speeds, lower latency, and significant cost savings for deploying generative AI models at scale. Through this partnership, Hugging Face users and FriendliAI customers will be able to easily deploy open source or custom generative AI models with unparalleled efficiency and reliability.
Simplify model deployment
Last year, FriendliAI introduced the Hugging Face integration, allowing users to seamlessly deploy Hugging Face models directly within the Friendli Suite platform. With this integration, users now have access to thousands of open source models supported by Hugging Face and can easily deploy private models. A list of model architectures currently supported by FriendliAI can be found here.
Now, we’re taking this integration even further by enabling the same functionality directly within Hugging Face Hub and providing one-click deployment for a seamless user experience. You can use your Friendli Suite account to deploy models directly from the model card in Hugging Face Hub.
Selecting Friendli Endpoints takes you to the FriendliAI model deployment page. Here, you can interact with an optimized open source model and deploy it to NVIDIA H100 GPUs. The deployment page provides an intuitive interface for setting up Friendli D dedicated Endpoints, a managed service for generative AI inference. Additionally, you can chat directly with your open source model on the page during the deployment process, making it easy to explore and test its functionality.
Deploy models using NVIDIA H100 on Friendli-only endpoints
Using FriendliAI’s advanced GPU-optimized inference engine, dedicated endpoints deliver fast and cost-effective inference as a managed service. Developers can easily deploy open source or custom models to NVIDIA H100 GPUs using the Friendli dedicated endpoint by clicking (Deploy Now) on the model deployment page.
Although the H100 GPU is powerful, it can be expensive to operate at scale. FriendliAI’s optimized services can significantly reduce costs by reducing the number of GPUs required while maintaining peak performance. Dedicated endpoints are not only cost-effective, but also simplify the complexity of infrastructure management.
Inferring open source models using Friendli serverless endpoints
Friendli Serverless Endpoints is the perfect solution for developers who want to efficiently infer open source models. The service provides a user-friendly API for models optimized by FriendliAI, ensuring high performance at low cost. Chat directly with these powerful open source models on the model deployment page.
what’s next
We are excited to deepen our collaboration with FriendliAI<>HF and increase the accessibility of open source AI to developers around the world. FriendliAI’s fast and cost-effective inference solution eliminates the complexity of infrastructure management, allowing users to focus on innovation. Together with FriendliAI, we remain committed to transforming the way AI is developed and driving breakthrough innovations that will shape the next era of AI.
You can also follow our organization page to stay updated on upcoming news 🔥