F5, a global leader in the delivery and protection of all apps and APIs, has announced new features for Kubernetes’ F5 Big-IP, accelerated with the NVIDIA Bluefield-3 DPUS and NVIDIA Doca software framework.
Sesterce is a leading European operator specializing in next-generation infrastructure and sovereign AI, designed to meet the needs of accelerated computing and artificial intelligence.
With the expansion of F5 application delivery and security platform, Kubernetes’ Big-IP, running natively on NVIDIA Bluefield-3 DPUs, provides high-performance traffic management and security for large-scale AI infrastructures, unlocking the efficiency, control and performance improvements of AI applications. In conjunction with the persuasive performance benefits announced earlier this year along with general availability, Sesterce successfully completed validation of its F5 and NVIDIA solutions across many key features, including the following areas:
Increased performance, multi-tenancy and security to meet cloud-grade expectations. Initially, GPU usage was improved by 20%. Integration with Nvidia Dynamo and KV Cache Manager reduces latency for inference and GPU and memory resource optimization in large-scale language model (LLM) inference systems. Smart LLM Routing on Bluefield DPUS runs effectively on NVIDIA NIM microservices for workloads that require multiple models, providing the best of all available models. The Scaling and Secure Model Context Protocol (MCP) includes reverse proxy functionality and protection for a more scalable and secure LLM, ensuring the power of your MCP server is quickly and securely available. Powerful data programmers with robust F5 IRULES capabilities allow for rapid customization to support AI applications and evolving security requirements.
The highlights of the new solution feature include:
Using this collaborative solution, LLM routing and dynamic load balancing using Kubernetes’ Big-IP allows you to route simple AI-related tasks to cheaper, lighter LLMs when supporting generated AI, while booking advanced models for complex queries. This level of customizable intelligence also allows routing functions to leverage domain-specific LLM, improving output quality and significantly improving customer experience. F5’s advanced traffic management ensures that queries are sent to the most appropriate LLM, lowering latency and improving time to the first token. Optimization to infer large scale GPUs for distributed AI inference using Nvidia Dynamo and KV cache integration
Earlier this year, Nvidia Dynamo was introduced and a supplementary framework was provided for deploying generated AI and inference models in large distributed environments. Nvidia Dynamo streamlines the complexity of running AI inference in a distributed environment by coordinating tasks such as scheduling, routing, and memory management to ensure seamless operations under dynamic workloads. Offloading specific operations from the CPU to the Blue Field DPU is one of the central advantages of the F5 and NVIDIA combination solution. With F5, the Dynamo KV cache manager feature can intelligently route based on capacity to accelerate generated AI use cases by using key value (kV) caches to accelerate processes based on retaining information from previous operations. From an infrastructure perspective, organizations that store and reuse KV cache data can do so for a small fraction of the cost of using GPU memory for this purpose.
Improved protection for MCP servers with F5 and NVIDIA
The Model Context Protocol (MCP) is an open protocol developed by mankind that standardizes the way applications provide context to LLM. By deploying a combination F5 and NVIDIA solution before the MCP server, F5 technology acts as a reverse proxy, enhancing the security features of the MCP solution and the LLM that it supports. Additionally, the full data programming enabled by F5 IRULES promotes rapid adaptation and resilience of rapidly evolving AI protocol requirements, and further protection against emerging cybersecurity risks.
The next big-ip for Kubernetes deployed on f5 nvidia bluefield-3 dpus is generally available now. For more information about other technologies and the benefits of deployment, visit www.f5.com and visit the Nvidia GTC Paris businesses that are part of this week’s Vivatech 2025 event. For more information, please also check out the F5 companion blog.
Youssef El Manssouri, CEO and co-founder of Sesterce
The integration of F5 and Nvidia was appealing even before testing. Our results highlight the advantages of the dynamic load balancing of F5 due to massive quantities of Kubernetes intrusion and exit in AI environments. This approach allows you to distribute traffic more efficiently and bring additional unique value to your customers while optimizing GPU usage. We are pleased to hear F5’s support for the increase in NVIDIA use cases, including enhancements to Multi-Tanancy. We look forward to additional innovations between businesses supporting the next generation of AI infrastructure.
Kunal Anand, F5’s top innovation officer
While enterprises are increasingly deploying multiple LLMs to power advanced AI experiences, routing and classification of LLM traffic is a highly compute-heavy, degraded performance and user experience. By programming the routing logic directly on the Nvidia Bluefield-3 DPUS, F5 Big-IP is the most efficient approach to delivering and securing LLM traffic next to Kubernetes. This is just the beginning. Our platform unlocks new possibilities for AI infrastructure and is excited to deepen our joint breach with Nvidia as enterprise AI continues to expand.
Ash Bhalgat, Senior Director of AI Networking and Security Solutions at Ecosystem and Marketing, AIN Networking and Security Solutions
Accelerated with Nvidia Bluefield-3 DPUs, Kubernetes’ Big-IP gives businesses and service providers a single control point to efficiently route traffic to AI factories, optimizing GPU efficiency and accelerates AI traffic for data invetting, model training, inference, RAG and agent AI. Additionally, F5’s support for multitenancy and programmerization using IRULE continues to provide a suitable platform for ongoing integration and functionality, including support for NVIDIA Dynamo Distributed KV Cache Manager.
Greg Schoeny, SVP, and Global Service Provider for World Wide Technology
Organizations implementing Agent AI are increasingly relying on MCP deployments to improve security and performance of LLMS. By bringing advanced traffic management and security to a wide range of Kubernetes environments, F5 and Nvidia offer an integrated set of AI features that are not currently found elsewhere in the industry.