ONNX Runtime is a cross-platform machine learning tool that can be used to accelerate a variety of models, especially those with ONNX support.
Hugging Face ONNX Runtime Support
HugFACE, an open source community where users can build, train and deploy hundreds of thousands of published machine learning models, has over 130,000 ONNX support models. These ONNX support models, including the increasingly popular large language models (LLMS) and cloud models, can leverage the ONNX runtime to improve performance. For example, using the ONNX runtime to accelerate the whispering model will improve the average delay per inference, resulting in an increase of up to 74.30% over Pytorch. The ONNX runtime works closely with hugging your face to ensure that the most popular models on the site are supported. In total, over 90 hugged face model architectures, including 11 most popular architectures, are supported by the ONNX runtime (population is determined by the corresponding number of models uploaded to the corresponding face hub):
Model Architecture Model Number of Models Bert 28180 gpt2 14060 Distilbert 11540 Roberta 10800 T5 10450 WAV2VEC2 6560Stable Diffusion 5880 XLM-ROBERTA 5100 Whisper 4400 BART 3590 MARIAN 28400
learn more
For more information about accelerating face models with the ONNX runtime, check out our recent posts on the Microsoft Open Source blog.