introduction
AI agents are rapidly becoming essential for building intelligent applications, but creating robust and adaptable agents that can scale across domains remains a challenge. Many existing frameworks suffer from vulnerabilities, tool misuse, and failures when faced with complex workflows.
CUGA (Configurable Generalist Agent) was designed to overcome these limitations. It is an open source AI agent that is flexible, reliable, and easy to use for enterprise use cases. CUGA abstracts away orchestration complexity, allowing developers to focus on domain requirements rather than the internals of agent construction. And now, with its integration into 🚀Hugging Face Spaces🚀, experimenting with CUGA and open models has never been easier.
What is Kuga?
CUGA is a configurable, general-purpose AI agent that supports complex multi-step tasks across web and API environments. Achieved state-of-the-art performance on key benchmarks.
🥇 #1 on AppWorld – Benchmark with 750 real-world tasks across 457 APIs
🥈 WebArena Top Level (#1 from 02/25 to 09/25) – Introducing the CUGA Compute Usage feature with complex benchmarks for autonomous web agents across application domains.
At its core, CUGA provides:
High-performance generalist agent: Benchmarked on complex web and API tasks, it combines best-of-breed agent patterns (planner-executor, code-act, etc.) with structured planning and smart variable management to prevent hallucinations and handle complexity. Configurable inference modes: Balance performance and cost/delay with flexible modes from fast heuristics to detailed planning to optimize for task requirements. Computer usage: Easily combine UI interactions and APIs. Multi-tool integration: Seamlessly integrate tools through OpenAPI specifications, MCP servers, and LangChain to enable quick connections to REST APIs, custom protocols, and Python functions Langflow integration: A low-code visual build experience to design and deploy agent workflows without extensive coding Composable: Expose CUGA as a tool to other agents, enabling nested inference and multi-agent collaboration
We also continue to innovate with new experimental features, including:
Configurable policies and human-involved instructions: Improve coordination and ensure secure behavior of agents in enterprise contexts. Save and reuse capabilities: Capture and reuse successful execution paths (plans, code, and trajectories) to achieve faster and more consistent behavior across repeated tasks.
Figure 1: CUGA agent architecture
The CUGA architecture starts with a user’s message flowing into the chat layer, which interprets the intent and builds the user’s goal based on context. The task planning and control component then breaks this goal down into structured subtasks that are tracked programmatically through a dynamic task ledger. This ledger supports replanning when necessary and ensures reliable execution. Subtasks are delegated to specialized agents, such as the API agent, which uses an internal inference loop to generate pseudocode instructions before invoking your code in a secure sandbox. The system leverages a tool registry that goes beyond the MCP protocol to parse and understand tool functionality to enable precise orchestration. Once all steps are completed, a final response is returned to the user, providing reliable and policy-compliant results.
CUGA works best when inference is fast. Each call takes several seconds, further increasing latency and creating a trade-off between agent capabilities and user experience. When run on a high-performance inference platform like Groq, we see that inference acceleration fundamentally extends the scope of what agent architectures can achieve.
Open source and open model
CUGA is fully open source under the Apache 2.0 license and can be found at cuga.dev.
By adopting an open model, CUGA aligns with Hugging Face’s ethos of democratizing AI, giving developers the freedom to choose the model that best suits their needs, whether for experimental or production use.
CUGA has been tested on a variety of open models including gpt-oss-120b and Llama-4-Maverick-17B-128E-Instruct-fp8 (both hosted on Groq). Our Hugging Face Space uses gpt-oss-120b and the model is hosted on Groq, providing fast response times to LLM calls.
Groq runs open models on custom-built LPUs. It is designed for AI inference and is ideal for the iterative agent inference required by CUGA’s architecture, allowing you to quickly complete planning, execution, and validation steps. The result is superior cost and performance. Open models are about 80-90% cheaper than closed models. Groq’s OpenAI-compatible API meets your production latency needs, and CUGA is fully configurable across models, providers, and deployment topologies.
Integration with Langflow: Simplify visual agent design
To further facilitate agent development, CUGA integrates with Langflow, an open-source visual programming interface for building LLM-powered workflows. An intuitive drag-and-drop interface reduces the barrier to entry for users who prefer low-code solutions.
Since Langflow 1.7.0, CUGA comes with its own widget that allows users to visually assemble complex multi-tool agents and deploy them with a single click. Try it at langflow.org.
Try the Hug Face Demo: Hands-on Preview
To give you a taste of what’s possible, we’ve launched a CUGA demo on Hugging Face Spaces. This demo introduces a small-scale CRM system and equips CUGA with 20 preconfigured tools to handle sales-related data queries and API interactions through the API agent. To make the experiment even more powerful, the demo provides access to a workspace file and allows you to use predefined policies.
Try Hugging Face Spaces and share your feedback.
Conclusion and call to action
CUGA brings a new level of flexibility and openness to building AI agents. To engage with us:

