A simple agent that writes actions in code.

Today we are releasing Smolagents, a very simple library that unlocks agent functionality for language models. An overview is shown below.
from smoking agent import CodeAgent, DuckDuckGoSearchTool, HfApiModel Agent = CodeAgent(tools=(DuckDuckGoSearchTool()), model=HfApiModel()) Agent.run(“How many seconds does it take for a leopard to run at full speed down the Pont des Arts?”)

🤔 What is an agent?

An efficient system that uses AI must provide LLMs with some access to the real world. For example, you can call a search tool to retrieve external information or run a specific program to solve a task. In other words, an LLM requires an agency. The agent program is the LLM’s gateway to the outside world.

An AI agent is a program whose LLM output controls the workflow.

All systems that utilize LLM integrate the LLM output into their code. The impact of LLM input on the code workflow depends on the LLM’s level of agency in the system.

Note that in this definition, “agent” is not a discrete definition of 0 or 1. Instead, “agency” evolves on a continuum spectrum as we give more or less authority to LLMs on the workflow.

The table below shows how agencies differ between systems.

Institution-level description Nomenclature Example pattern ☆☆☆ LLM output has no effect on program flow Simple processor process_llm_output(llm_response) ★☆☆ LLM output determines basic control flow Router if llm_decision(): path_a () else: path_b() ★★ ☆ LLM output determines function execution Tool call run_function(llm_chosen_tool, llm_chosen_args) ★★★ LLM output controls iteration and program continuation Multi-step agent while llm_Should_ continue():execute_next_step() ★★★ One agent workflow can start another agent workflow Multi-agent if llm_trigger():execute_agent()

The code structure for a multi-step agent is as follows:

Memory = (user-defined task)
meanwhile llm_Should_Continue(memory): action = llm_get_next_action(memory) observed value =execute_action(action) memory += (action, observed value)

Therefore, the system runs in a loop, performing a new action at each step (actions are simply may include calls to some predetermined tools that are functions). . Here’s an example of how a multi-step agent can solve a simple math problem.

✅ When to use agents / ⛔ When to avoid agents

Agents are useful when you need LLM to determine the workflow of your app. But they are often overdone. The question is: Do we really need flexibility in our workflows to efficiently solve the task at hand? If predetermined workflows are frequently not met, it means you need more flexibility. Let’s look at an example. Let’s say you’re building an app to handle customer requests for a surfing travel website.

We can know in advance that a request may belong to one of two buckets (based on the user’s selection), and we have predefined workflows for each of these two cases.

Need travel knowledge? ⇒ Provides access to a search bar to search our knowledge base Want to speak to a sales representative? ⇒ Let them fill out the contact form.

If that deterministic workflow fits all your queries, then by all means code them all. This results in a 100% reliable system without the risk of errors due to unpredictable LLMs interfering with your workflow. For simplicity and robustness, we recommend normalizing agent behavior to avoid using it.

But what if the workflow cannot be determined in advance?

For example, a user might want to ask: “I can come on Monday, but I forgot my passport and risk being late on Wednesday. Is it possible to take me and my luggage surfing on Tuesday morning with cancellation insurance?” This question depends on many factors, but perhaps none of the given criteria above are sufficient for this request.

If predetermined workflows are frequently not met, it means you need more flexibility.

This is where agent settings come in handy.

In the example above, you could create a multi-step agent that can access the Weather API to forecast the weather, the Google Maps API to calculate distance traveled, the employee availability dashboard, and the RAG system on the knowledge base. can.

Until recently, computer programs were limited to predetermined workflows, trying to handle complexity by stacking up if/else switches. They focused on very narrow tasks, such as “calculate the sum of these numbers” or “find the shortest path in this graph.” But in reality, most real-world tasks don’t fit into a predetermined workflow, like the travel example above. Agent systems open up a vast world of real-world tasks to programs.

code agent

In a multi-step agent, at each step, the LLM can create an action in the form of a call to an external tool. A common format for writing these actions (used by Anthropic, OpenAI, and many others) is usually “Write the action as JSON with the tool name and arguments to use, parse it, and “Knowing which tools to run with which tools.” argument”.

Multiple research papers have shown that it is much better to use tools that call LLM within your code.

The reason is simple: we created a code language specifically to be the best way to express the actions performed by a computer. If JSON snippets were a better representation, JSON would be the top programming language and programming would be hell on earth.

The diagram below, taken from Executable Code Actions Elicit Better LLM Agents, illustrates some of the benefits of writing actions in code.

You get better results when you write actions in code rather than JSON-like snippets.

Configurability: Can I nest JSON actions inside each other or define a set of JSON actions that I want to reuse later, just like I would define a Python function? Object management: For actions like generate_image How do I save the output to JSON?Versatility: The code is built to simply represent anything you can make your computer do. Representation in LLM training data: LLM training data already contains many high-quality code actions. This means that LLMs are already trained for it.

Introducing Smolagent: Making Agents Simple 🥳

We developed Smolagent with the following objectives.

✨ Simplicity: Agent logic fits into a few thousand lines of code (see this file). We kept the abstraction to a minimal form above the raw code.

🧑‍💻 First-class support for code agents, agents that write actions in code (as opposed to “agents used to write code”). To be safe, we support running in a sandbox environment via E2B.

🤗 Hub integration: Share and load tools to and from your hub. More features will be added soon.

🌐 Support for any LLM: Supports hub-hosted models loaded through Transformer versions or the Inference API, but also models from OpenAI, Anthropic, and many others through LiteLLM integration.

smolagents is the successor to transformers.agents and will be replaced in the future when transformers.agents is deprecated.

Building the agent

Building an agent requires at least two elements.

Tools: Lists that agents can access Model: LLM, which is the agent’s engine.

The model can be any LLM. Open a model with the HfApiModel class shown in the leopard example above, which leverages Hugging Face’s free inference API, or use LiteLLMModel, which leverages litellm to choose from a list of over 100 different LLMs.

For this tool, you can create a function with input and output type hints and a docstring that gives input descriptions, and use the @tool decorator to make it a tool.

Here’s how to create a custom tool that retrieves travel times from Google Maps and uses them in your trip planner agent.

from typing import option
from smoking agent import CodeAgent, HfApiModel, Tools

@tool
surely get_travel_duration(Starting position: strdestination location: strdeparture time: option(integer) = none) -> str:
“””Get the travel time between two locations by car.

argument:
start_location: Where to start the ride
destination_location: destination
Departure time: Departure time. If you want to specify this, just specify `datetime.datetime`.
“”
import google map
import os gmaps = googlemaps.Client(os.getenv(“GMAPS_API_KEY”))

if Departure time teeth none:
from date and time import datetimeMonday = datetime(2025, 1, 6, 11, 0) Directions result = gmaps.directions( start position, destination position, mode =“Transit”departure time = departure time)
return Directions_Results(0)(“feet”)(0)(“interval”)(“Sentence”) Agent = CodeAgent(tools=(get_travel_duration), model=HfApiModel(), Additional_authorized_imports=(“Date and time”)) agent.run(“Could you recommend some nice day trips around Paris, specifying a few locations and times? It could be in the city or outside, but it should take less than a day. Travel by public transport only I will.”)

After several steps of collecting travel times and performing calculations, the agent returns the following final proposal:

Out – Final Answer: I recommend a one-day Paris itinerary. Visit the Eiffel Tower 9am to 10:30am Visit the Louvre Museum 11am to 12:30pm Visit Notre Dame Cathedral 1pm – 2:30pm Visit the Palace of Versailles 3:30pm – 5pm Note: The travel time to the Palace of Versailles is approximately 59 minutes from Versailles to Notre Dame Cathedral. minutes, so plan your day accordingly.

After you build your tool, sharing it to your hub is as simple as:

get_travel_duration.push_to_hub(“{your_username}/get-travel-duration-tool”)

The results will appear below this space. The logic of the tool can be seen in the file tool.py in the space. As you can see, the tool was actually exported into a class that inherits from the Tool class, which is the underlying structure for all tools.

How powerful is the open model for agent workflows?

We created CodeAgent instances using several leading models and compared them in this benchmark, which collects questions from several different benchmarks and proposed different combinations of challenges.

For more information about the agent settings used, find the benchmark here and see the comparison between code agents and tool-invocation agents (spoiler: your code will behave better).

This comparison shows that open source models can compete with the best closed models.

Next step 🚀

See Full Bio

What's Hot

AI art model Primo introduces a new frontier of generated art – Business applications and trends | AI news details

Pixverse AI Effects Drive Virus User Generated Content Using Text Conversion Tools | AI News Details

AI Art Generation Using Primo Models: How Pikumen Converts Digital Wallpaper Creation | AI News Details

Build a face MCP server to hug

Raise LLMS using Gradio MCP server

Efficient Multimodal Data Pipeline

Leading the Korean LLM evaluation ecosystem

Introducing the Red Team Resistance Leaderboard

Will AI apps help carry the mental load of moms?

Most Popular