
We look forward to sharing some great news! AI Builder can now accelerate applications with Google Cloud TPU to embrace Face inference endpoints and spaces.
For those who are unfamiliar, TPU is custom-made AI hardware designed by Google. They are known for their ability to provide cost-effective scaling and deliver impressive performance across a variety of AI workloads. This hardware has played a key role in some of Google’s latest innovations, including the development of the Gemma 2 open model. We look forward to announce that TPU will be available for use in inference endpoints and spaces.
This is a major step in ongoing collaboration to provide the best tools and resources for your AI projects. I’m really looking forward to seeing the amazing things we create with this new feature!
Support for TPU face inference endpoints
The embracing face inference endpoint provides a seamless way to deploy generated AI models with several clicks on a dedicated managed infrastructure using a cloud provider of your choice. Starting today, Google TPU V5E is available for inference endpoints. Select the model you want to deploy, select Google Cloud Platform, and select US-West1 and you are ready to select a TPU configuration.
There are three instance configurations, but there are even more instance configurations in the future.
V5LitePod-1 TPU V5E V5LitePod-4 TPU V5E 4 cores and 64 GB memory ($5.50/hour) TPU V5E with 8 cores and 128 GB memory ($11.00/hour)
You cannot use V5LitePod-1 for models with up to 2 billion parameters, but to avoid memory budget issues, we recommend using V5LitePod-4 for larger models. The larger the configuration, the lower the latency.
We are excited to bring TPU performance and cost-effectiveness to the embracing face community along with Google’s product and engineering team. This collaboration has brought some great developments.
I created an open source library called Optimum TPU. This allows you to easily train and deploy face model embraces on Google TPU. The inference endpoint uses the optimal TPU along with the TPU (TGI) to provide a large language model (LLM) on the TPU. We are always working to support a variety of model architectures. Starting today, you can expand Gemma, Llama, and Mistral with a few clicks. (Optimal TPU support model).
Hugging TPU face space support
Holding facespaces, developers provide a platform to quickly create, deploy and share AI-powered demos and applications. We look forward to introducing new TPU V5E instance support to hug face spaces. To upgrade spaces to run on the TPU, go to the Set Spaces button and select the desired configuration.
V5LitePod-1 TPU V5E V5LitePod-4 TPU V5E 4 cores and 64 GB memory ($5.50/hour) TPU V5E with 8 cores and 128 GB memory ($11.00/hour)
Do your build, share with the community and share a fantastic ML-powered demo with a TPU hugging your facespace!
We are proud of what we have achieved alongside Google and can’t wait to see how we use TPU in our projects.