Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

Autonomy in the real world? Druid AI releases AI agent “Factory”

October 24, 2025

Netflix goes “all in” on AI to “enhance” content production

October 23, 2025

Co-building an open agent ecosystem: Introducing OpenEnv

October 23, 2025
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Friday, October 24
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources
Versa AI hub
Home»Tools»Open Source DeepResearch – Unlocking Search Agents
Tools

Open Source DeepResearch – Unlocking Search Agents

By February 7, 2025Updated:February 13, 2025No Comments8 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
Share
Facebook Twitter LinkedIn Pinterest Email

Yesterday, Openai released Deep Research, a system that browses the web to summarise content and answer questions based on summary. The system was impressive and blew my mind the first time I tried it.

One of the main results of the blog post is a significant improvement in performance with the general AI Assistant Benchmark (GAIA). This is the benchmark I’ve been playing recently, and I managed to get a nearly 67% correct answer on one shot. On average, 47.6% on particularly challenging “Level 3” questions, particularly involving multiple steps in inference and using tools (see below for GAIA presentation).

DeepResearch is an internal “agent” that guides LLM (you can choose from the current list of LLMs provided by OpenAI, 4O, O1, O3, etc.) and LLM to organize actions using tools such as web search. It consists of a framework. With steps.

While powerful LLMs are now open source and freely available (see more recent Deepseek R1 models), Openai has not revealed much about the agent framework underlying deep research…

So we decided to recreate the results and set out on a 24-hour mission to open source the necessary frameworks along the way!

The clock is ticking every moment, let’s go! ⏱⏱️

table of contents

What is an Agent Framework?

The agent framework is a layer at the top of LLM that organizes its operations in a series of steps, such as browsing the web or reading PDF documents. For a quick agent intro, check out this incredible interview with Andrew Ng and an introduction blog post to the Smolagents Library. For more detailed diving for agents, you can subscribe to the agent courses that start in a few days. Please link here.

Most people have already experienced how powerful LLMS is just playing on Chatbots. But what everyone knows yet is that by integrating these LLMs into an agent system, we can give them real superpowers.

Here is a recent example comparing the performance of Frontier LLM with and without the agent framework (in this case Simple Smolagents Library):

In fact, Openai highlighted how dramatically deep search is better than standalone LLM in its knowledge-intensive “The Last Exam of Humanity” benchmark.

So, what happens when you integrate the current top LLM into your agent framework and work towards an open reepearch?

A quick note: I benchmark the results of the same Gaia Challenge, but please note that this is an ongoing work. Deepresearch is a massive achievement, and its open replication takes time. In particular, complete parity requires improved browser usage and interactions like those provided by Openai operators. This means that it goes beyond the current text-only web interactions that we investigate in this first step.

First, let’s understand the scope of the task: Gaia.

Gaia Benchmark

Gaia is undoubtedly the most comprehensive benchmark for agents. That question is extremely difficult and conflicts with many challenges in LLM-based systems. Here is an example of a difficult question:

Which of the fruits shown in the 2008 painting “Embroidery from Uzbekistan” was served as part of the October 1949 Ocean Liner breakfast menu. Pass items as a comma-separated list and order clockwise based on their layout in the painting starting at 12 o’clock. Use the plural form of each fruit.

This question has several challenges.

Collect some information, using multimodal features (to extract fruit from images) answer in a constrained format. Nautical” Find the October 1949 breakfast menu for Ocean Liner above and connect the problem-solving trajectories in the correct order.

To resolve this, both high level of planning ability and strict execution are required. These are two areas that LLM struggles with when used alone.

An excellent set of tests for the agent system!

On Gaia’s public leaderboard, GPT-4 doesn’t even reach 7% in the validation set when used without agent setup. On the other side of Spectrum, deep research has led Openai to a score of 67.36% on the validation set, making it a few orders of magnitude better! (Though I don’t know how they actually freight on a private test set.)

Let’s see if open source tools can do better!

Building open and deep research

Use Codeagent

The first improvement over the traditional AI agent systems we are working on is to use so-called “code agents.” As Wang et al shows. (2024), having agents represent actions in code have several advantages, but most notably, their code is specifically designed to represent a complex set of actions.

Wang et al. Consider this example by:

Code Agent

This highlights some of the benefits of using code.

The code actions are much more concise than JSON. Do I need to run four parallel streams of five consecutive actions? In JSON, each must generate 20 JSON blobs in a separate step. In code, there’s only one step. On average, this paper shows that code actions require 30% fewer steps than JSON, and that the generated tokens correspond to comparable reductions. LLM calls are often the dimension cost of the agent system, meaning running an agent system is ~30% cheaper. With code, you can reuse tools from popular libraries from better performance in benchmarks for two reasons.

The above advantages were confirmed by experiments with agent_reasoning_benchmark.

You can also cite notable additional benefits from building Smolagents. This is a better handling of condition. This is especially useful for multimodal tasks. Do I need to save this image/audio/etc for later use? There’s no problem. Simply assign it as a state variable and you can reuse the four steps if necessary. In JSON, you must name your LLM dictionary keys and trust that LLM is still available.

Create the right tools 🛠️

Next, you need to provide the agent with the appropriate toolset.

1. Web browser. To reach full performance, you’ll need full-scale web browser interactions like an operator, but for the first concept, I started with a very simple text-based web browser. You can find the code here

2. A simple text inspector who can read a bunch of text file formats find it here.

These tools were taken from the outstanding Magentic-One agent by Microsoft Research, praise. Our goal was to get as high a performance as possible with the lowest possible complexity, so we didn’t change them much.

This is a short roadmap of improvements that I think will really improve the performance of these tools (please open your PR and contribute!):

Expands the number of file formats that can be read. We suggest more fine-tuned processing of the file. Replace your web browser with a vision-based one. This started here.

Results 🏅

Over 24 hours of breeding sprints have steadily improved the performance of agents in Gaia!

We quickly rose from previous SOTA using Magent-One’s open framework (approximately 46% of Magent-One).

This performance bump is primarily due to having the agent write actions in code! In fact, switching to a standard agent that writes actions in JSON instead of code will immediately decompose the performance of the same setup to an average of 33% of the validation set.

This is the final agent system.

Set up a live demo here and try it out!

But this is just the beginning and there is a lot to improve! Open tools can be improved and Smolagents framework can also be tweaked. We also want to explore better open model performance to support agents.

We welcome communities who participate in this effort. This allows you to leverage the power of open research to build a great open source agent framework. With a completely local and customized approach, anyone can run agents like Deepresearch at home using their favorite models.

Replica of the community

While we were working on this and focused on Gaia, other excellent and open implementations of deep research emerged, especially from the community.

Each of these implementations uses different libraries for indexing data, web browsing, LLMS queries. In this project, I would like to replicate the benchmark presented by OpenAI (Pass@1 average score), benchmark the benchmark, and switch to LLMS (such as Deepseek R1) to document the findings. I think so. agent.

The most important next step

Openai’s deep research is probably boosted by the excellent web browsers featured in the operators.

So we’re working on that next! A more common problem is to build a GUI agent, an agent that can display the screen and act directly with the mouse and keyboard. If you are excited about this project and would like to make it accessible to everyone through open source to such cool features, we would like to make your contribution.

We also hire full-time engineers to help us tackle this.

author avatar
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleDeepmind claims that its AI is superior to international mathematical Olympic gold medalists
Next Article Researchers’ use of AI varies by career stage, field and region

Related Posts

Tools

Autonomy in the real world? Druid AI releases AI agent “Factory”

October 24, 2025
Tools

Co-building an open agent ecosystem: Introducing OpenEnv

October 23, 2025
Tools

Investigate top AI security threats

October 23, 2025
Add A Comment

Comments are closed.

Top Posts

Paris AI Safety Breakfast #3: Yoshua Bengio

February 13, 20256 Views

WhatsApp blocks AI chatbots to protect business platform

October 19, 20254 Views

Investigate top AI security threats

October 23, 20253 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

Paris AI Safety Breakfast #3: Yoshua Bengio

February 13, 20256 Views

WhatsApp blocks AI chatbots to protect business platform

October 19, 20254 Views

Investigate top AI security threats

October 23, 20253 Views
Don't Miss

Autonomy in the real world? Druid AI releases AI agent “Factory”

October 24, 2025

Netflix goes “all in” on AI to “enhance” content production

October 23, 2025

Co-building an open agent ecosystem: Introducing OpenEnv

October 23, 2025
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2025 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?