Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

kv cache from scratch in nanovlm

June 4, 2025

Workplace AI Series – Part 3: Artificial Intelligence in Employment: How States Around Pennsylvania Are Near Legal Situation | Tucker Aresberg, PC

June 4, 2025

AI-Media announces innovative AI voice translation at NAB Show 2025

June 4, 2025
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Wednesday, June 4
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
Versa AI hub
Home»Tools»Jacks of all trades, some masters, multipurpose trans agent
Tools

Jacks of all trades, some masters, multipurpose trans agent

versatileaiBy versatileaiJune 3, 2025No Comments8 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
#image_title
Share
Facebook Twitter LinkedIn Pinterest Email








We look forward to sharing the Jack (JAT) of all trades, a project aimed at moving in the direction of generalist agents. This project began as an open replica of GATO (Reed et al., 2022) work. This proposed training transformers that can perform both vision and language and decision-making tasks. So I started by building an open version of Gato’s dataset. We then trained a multimodal transformer model and introduced some improvements over GATO to process sequential data and continuous values.

Overall, the project looked like this:

Release of numerous expert RL agents on a wide variety of tasks. Release of the JAT dataset, the first dataset for generalist agent training. This includes the trajectory of hundreds of thousands of experts who gathered together with expert agents to release the JAT model, a transformer-based agent that can play video games.

Global Schema

Datasets and Expert Policies

Experts’ Policy

RL traditionally includes training policies in a single environment. Utilizing these expert policies is a true way to build a versatile agent. We have selected a wide range of environments with a wide range of properties and difficulties, including Atari, Babyai, Meta-World, Mujoco and more. For each of these environments, we train the agents to reach cutting-edge performance. (For babyai, use the babyai bot instead). The resulting agent is called an expert agent and is released on the 🤗Hub. The JAT Dataset card has a list of all agents.

JAT Data Set

Release the JAT dataset, the first dataset for generalist agent training. The JAT dataset includes the trajectories of the above expert agents and hundreds of thousands of experts collected. To use this dataset, simply load it just like any other dataset on Hub.

>>> from Dataset Import load_dataset
>>> dataset = load_dataset(“Jat-Project/Jat-Dataset”, “MetaWorld-Assembly”))
>>> first_episode = dataset(“train”) ()0))
>>> first_episode.keys() dict_keys((()“Continuous_observations”, ‘Continuous_actions’, “Reward”)))
>>> Ren(first_episode (“Reward”)))
500
>>> first_episode (“Continuous_actions”) ()0) ()6.459120273590088, 2.2422609329223633–5.914587020874023–19.799840927124023))

In addition to RL data, it includes a text dataset to enable a unique interface for users. So there is also a subset of Wikipedia, Oscars, OK-VQA and conceptual captions.

JAT Agent Architecture

The JAT architecture is based on transformers using Eleutherai’s GPT-Neo implementation. The particularity of JAT lies in the embedding mechanism built to essentially handle traditional decision tasks. Repeat observation embedding with action embedding with corresponding rewards.

Model
JAT network architecture. On the one hand, sequential decision-making tasks on the one hand, observations, rewards, and actions on the other are encoded and interleaved. This model uses a causal mask to generate the next embedding on the autonetwork, and generates decoding according to the expected modality.

Therefore, each embedding corresponds to an observation (associated with a reward) or an action. But how does JAT encode this information? It depends on the type of data. If the data (observation or action) is an image (as in Atari), JAT uses a CNN. For continuous vectors, JAT uses a linear layer. Finally, if it is a separate value, JAT uses a linear projection layer. The same principle is used for model output, depending on the type of data being predicted. Prediction is causal and shifts observations in one time step. In this way, the agent must predict the next action from all previous observations and actions.

Plus, I found it fun to train the agents to perform NLP and CV tasks. To do this, I also provided the encoder with the option to retrieve text and image data as input. For text data, it is tokenized using a GPT-2 tokenization strategy, and uses a vit-type encoder for the images.

Given the potential for data modalities to change from one environment to another, how does JAT calculate the loss? Calculate the loss for each modality individually. For images and continuous values, use MSE loss. For discrete values, use loss of entropy. The final loss is the average of the losses for each element of the sequence. Wait, does that mean that we give equal weight to predicting actions and observations? In fact, no, I’ll go into more detail below.

Experiments and results

Rate JAT on all 157 training tasks. Collect 10 episodes and record total rewards. Aggregates results by domain for easier readability.

Evolution of scores
Aggregation expert normalized scores with 95% confidence intervals (CIs) for each RL domain as a function of the learning step.

To summarise these results in one number, the average performance is 65.8% compared to JAT experts over four domains. This shows that JAT can mimic expert performance on a very wide variety of tasks. Let’s explain in a bit more detail:

For the Atari 57, the agent achieves 14.1% of the expert score, which is equivalent to 37.6% of human performance. It surpasses human performance in 21 games. In BabyAI, the agent achieves 99.0% of the expert score and cannot exceed 50% of the expert in one task. In the meta world, agents reached 65.5% of experts. In Mujoco’s case, the agent achieves 84.8% of the experts.

Evolution of scores
Human normalization scores of JAT agents on the Atari 57 benchmark.

Most impressive is that JAT achieves this performance using a single network for all domains. To measure this performance, let’s take a look at the rendering of JAT for several tasks.

Would you like to give it a try? you can! The JAT model is available at the 🤗Hub!

For text tasks, the model presents a basic feature, so we will introduce the paper to the reader for more information.

The surprising benefits of predicting observations

When training RL agents, the main goal is to maximize future rewards. But what happens if you ask your agent to predict what you’ll observe in the future? Does this additional task help or interfere with the learning process?

There are two dissenting opinions on this question. On the one hand, learning to predict observations can lead to a deeper understanding of the environment, leading to better and faster learning. On the other hand, agents may distract themselves from their main goals, resulting in mediocre performance in both observational and action predictions.

To resolve this argument, we performed using a loss function that combines observational and behavioral losses, using weighting parameters. κ\ kappa κ Balance the two objectives.

Kappa consolidation
Aggregated measurements with 95% CI for research into the effects of observed predictive learning for selected tasks. The presented results cover a range of selected kappa values ​​and are based on 100 ratings per task. Best κ\ kappa κ Choice can greatly improve agent performance.

The results were remarkable. when κ\ kappa κ Too high (0.5), the additional purpose of predicting observations seemed to hinder the learning process. But when κ\ kappa κ It was low, negligible learning impact, and agent performance was similar to that obtained when observational predictions were not part of the objective.

But we found sweet spots around κ=0.005 \ kappa=0.005 κ=0.005,Learning to predict observations has actually improved the learning efficiency of agents. Our study suggests that adding observational predictions to the learning process is beneficial, as long as they are properly balanced. This finding has important implications for designing such agents, highlighting the potential value of the supplementary goals in improving learning efficiency.

Therefore, the next time you train your RL agent, consider asking them to predict what will be observed in the future. It may just improve your performance and speed up your learning!

Conclusion

This work introduced JAT, a multipurpose trans agent that can master a variety of sequential decision-making tasks and demonstrate basic functionality in NLP and CV tasks. For all these tasks, JAT uses a single network. Our contributions include the release of Expert RL agents, JAT datasets, and JAT models. We hope that this work will stimulate future research in the field of generalist agents and contribute to the development of more versatile and competent AI systems.

What’s next? Research Request

The JAT project believes it has opened up a new direction for research in the field of generalist agents, and has only damaged the surface. Here are some ideas for future work:

Data Improvement: Although pioneering, the JAT dataset is still in its early stages. Expert trajectories come from only one expert agent per environment, which can cause bias. We have tried our best to reach the latest performance, but some environments are still difficult. We believe it will be very helpful to collect more data and train more professional agents.

Using offline RL: JAT agents are trained using basic behavioral clones. This means two things: (1) Suboptimal trajectories cannot be used. (2) JAT agents cannot outperform experts. I chose this approach for simplicity, but I think that using offline RL will help improve agent performance, but it’s not too complicated to implement.

Maximize the possibilities of smarter multitasking sampling strategies: Currently, JAT agents sample data uniformly from all tasks, but this approach may be hindering it. Dynamically adjusting the sampling rate to focus on the most challenging tasks can enhance the agent’s learning process and unleash significant performance gains.

link

Quote

@article {allouedec2024jack, title = {{jack of all trades, some masters, multipurpose trans agent}}, author = {gallouédec, quentin and beecking, edcement and romac, clément and dellandréa, emmanuel}, Journal = {arxiv preprint arxiv:2402.09844} {2024}, url = {https://arxiv.org/abs/2402.09844}}}

author avatar
versatileai
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleArtificial Power: 2025 Landscape Report
Next Article JMU Education Professor was awarded for AI Research
versatileai

Related Posts

Tools

kv cache from scratch in nanovlm

June 4, 2025
Tools

Gemini 2.5 native audio features

June 4, 2025
Tools

IBM and Roche use AI to predict blood glucose levels

June 3, 2025
Add A Comment
Leave A Reply Cancel Reply

Top Posts

New Star: Discover why 보니 is the future of AI art

February 26, 20253 Views

How to use Olympic coders locally for coding

March 21, 20252 Views

SmolVLM miniaturization – now available in 256M and 500M models!

January 23, 20252 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

New Star: Discover why 보니 is the future of AI art

February 26, 20253 Views

How to use Olympic coders locally for coding

March 21, 20252 Views

SmolVLM miniaturization – now available in 256M and 500M models!

January 23, 20252 Views
Don't Miss

kv cache from scratch in nanovlm

June 4, 2025

Workplace AI Series – Part 3: Artificial Intelligence in Employment: How States Around Pennsylvania Are Near Legal Situation | Tucker Aresberg, PC

June 4, 2025

AI-Media announces innovative AI voice translation at NAB Show 2025

June 4, 2025
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2025 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?