Amazon has introduced the Nova Act, an advanced AI model designed for smarter agents that can perform tasks within a web browser.
Large-scale language models have popularized the concept of “agent” as a tool to answer queries or retrieve information via methods such as searched generation (RAG), but Amazon expects something more robust. The company defines agents as not only responders, but also as entities capable of performing tangible multi-step tasks in a variety of digital and physical environments.
“Our dream is for agents to perform a wide range of complex, multi-step tasks, such as organizing weddings and manipulating complex IT tasks to increase business productivity,” says Amazon.
The current market offering is often under-reaching, with many agents requiring ongoing human oversight, and its capabilities rely on comprehensive API integration. It is not feasible for all tasks. The Nova Act is Amazon’s answer to these restrictions.
In addition to the models, Amazon has released an investigation preview for the Amazon Nova Act SDK. Using the SDK, developers can create agents that can automate web tasks, such as sending out out-of-office notifications, scheduling calendar holds, and enabling automatic email replies.
The SDK aims to break down complex workflows into trusted “atomic commands” such as searching, checking out, interacting with specific interface elements such as dropdowns and popups. You can add detailed instructions to improve these commands so that developers can instruct agents to bypass insurance upsells during checkout.
To further improve accuracy, the SDK supports browser operations via Playwright, API Calls, Python integration, and parallel threads to overcome web page load latency.
Nova Act: Exceptional performance in benchmarks
Unlike other generative models that show intermediate accuracy for complex tasks, Nova Act prioritizes reliability. Amazon usually highlights an impressive score from over 90% of models on internal ratings for certain features that challenge their competitors.
The Nova Act achieved a near perfect 0.939 with the Screenspot Web Text Benchmark, which measures natural language instructions for text-based interactions such as adjusting the Font size. Competing models such as Claude 3.7 Sonnet (0.900) and Openai’s CUA (0.883) are tracked backwards with large margins.
Similarly, the Nova Act scored 0.879 in the Screenspot Web Icon Benchmark. This tests interaction with visual elements such as rating stars and icons. Designed to assess AI proficiency in navigating various user interface elements, the GroundUI web test shows a competitor with a slight run by NOVA ACT, but Amazon sees this as a ripe area of improvement as the model evolves.
Amazon emphasizes its focus on providing practical reliability. Once an agent built using the Nova Act works as expected, developers can deploy it primarily, integrate it as an API, or schedule tasks to run asynchronously. In one proven use case, the agent automatically orders a salad for delivery every Tuesday evening without the need for ongoing user intervention.
Amazon sets up a vision for a scalable and smart AI agent
One of the outstanding features of the Nova Act is that it allows you to transfer user interface understanding to a new environment with minimal additional training. Amazon shared instances where the Nova Act worked brilliantly in browser-based games, despite the lack of video game experiences in the training. This adaptability allows NOVA to act as a multipurpose agent for a variety of applications.
This feature is already being used in Amazon’s unique ecosystem. Within Alexa+, Nova Act allows self-directed web navigation to complete user tasks, even if API access is not comprehensive enough. This represents a step towards a smarter AI assistant that can function independently, leveraging your skills in a more dynamic way.
Amazon reveals that the Nova Act represents the first stage of a broader mission to create intelligent and reliable AI agents that can handle increasingly complex and multi-step tasks.
To expand beyond simple instructions, Amazon’s focus is on training agents through reinforcement learning across a variety of real-world scenarios rather than overly simplified demonstrations. This basic model serves as a checkpoint in the long-term training curriculum of the NOVA model, demonstrating the company’s ambition to rebuild the AI agent landscape.
“The most valuable use cases for agents have not yet been built,” Amazon said. “The best developers and designers will discover them. This research preview from NOVAACTSDK allows them to iterate alongside these builders through rapid prototyping and iterative feedback.”
The Nova Act is a step towards ensuring AI agents are truly useful for complex and digital tasks. From rethinking benchmarks to emphasizing reliability, its design philosophy focuses on allowing developers to move beyond what is possible with the tools of the current generation.
See: Humanity provides insight into Claude’s “AI biology”
Want to learn more about AI and big data from industry leaders? Check out the AI & Big Data Expo in Amsterdam, California and London. The comprehensive event will be held in collaboration with other major events, including the Intelligent Automation Conference, Blockx, Digital Transformation Week, and Cyber Security & Cloud Expo.
Check out other upcoming Enterprise Technology events and webinars with TechForge here.