Jacks of all trades, some masters, multipurpose trans agent

We look forward to sharing the Jack (JAT) of all trades, a project aimed at moving in the direction of generalist agents. This project began as an open replica of GATO (Reed et al., 2022) work. This proposed training transformers that can perform both vision and language and decision-making tasks. So I started by building an open version of Gato’s dataset. We then trained a multimodal transformer model and introduced some improvements over GATO to process sequential data and continuous values.

Overall, the project looked like this:

Release of numerous expert RL agents on a wide variety of tasks. Release of the JAT dataset, the first dataset for generalist agent training. This includes the trajectory of hundreds of thousands of experts who gathered together with expert agents to release the JAT model, a transformer-based agent that can play video games.

Datasets and Expert Policies

Experts’ Policy

RL traditionally includes training policies in a single environment. Utilizing these expert policies is a true way to build a versatile agent. We have selected a wide range of environments with a wide range of properties and difficulties, including Atari, Babyai, Meta-World, Mujoco and more. For each of these environments, we train the agents to reach cutting-edge performance. (For babyai, use the babyai bot instead). The resulting agent is called an expert agent and is released on the 🤗Hub. The JAT Dataset card has a list of all agents.

JAT Data Set

Release the JAT dataset, the first dataset for generalist agent training. The JAT dataset includes the trajectories of the above expert agents and hundreds of thousands of experts collected. To use this dataset, simply load it just like any other dataset on Hub.

>>> from Dataset Import load_dataset
>>> dataset = load_dataset(“Jat-Project/Jat-Dataset”, “MetaWorld-Assembly”))
>>> first_episode = dataset(“train”) ()0))
>>> first_episode.keys() dict_keys((()“Continuous_observations”, ‘Continuous_actions’, “Reward”)))
>>> Ren(first_episode (“Reward”)))
500
>>> first_episode (“Continuous_actions”) ()0) ()6.459120273590088, 2.2422609329223633–5.914587020874023–19.799840927124023))

In addition to RL data, it includes a text dataset to enable a unique interface for users. So there is also a subset of Wikipedia, Oscars, OK-VQA and conceptual captions.

JAT Agent Architecture

The JAT architecture is based on transformers using Eleutherai’s GPT-Neo implementation. The particularity of JAT lies in the embedding mechanism built to essentially handle traditional decision tasks. Repeat observation embedding with action embedding with corresponding rewards.

JAT network architecture. On the one hand, sequential decision-making tasks on the one hand, observations, rewards, and actions on the other are encoded and interleaved. This model uses a causal mask to generate the next embedding on the autonetwork, and generates decoding according to the expected modality.

Therefore, each embedding corresponds to an observation (associated with a reward) or an action. But how does JAT encode this information? It depends on the type of data. If the data (observation or action) is an image (as in Atari), JAT uses a CNN. For continuous vectors, JAT uses a linear layer. Finally, if it is a separate value, JAT uses a linear projection layer. The same principle is used for model output, depending on the type of data being predicted. Prediction is causal and shifts observations in one time step. In this way, the agent must predict the next action from all previous observations and actions.

Plus, I found it fun to train the agents to perform NLP and CV tasks. To do this, I also provided the encoder with the option to retrieve text and image data as input. For text data, it is tokenized using a GPT-2 tokenization strategy, and uses a vit-type encoder for the images.

Given the potential for data modalities to change from one environment to another, how does JAT calculate the loss? Calculate the loss for each modality individually. For images and continuous values, use MSE loss. For discrete values, use loss of entropy. The final loss is the average of the losses for each element of the sequence. Wait, does that mean that we give equal weight to predicting actions and observations? In fact, no, I’ll go into more detail below.

Experiments and results

Rate JAT on all 157 training tasks. Collect 10 episodes and record total rewards. Aggregates results by domain for easier readability.

Aggregation expert normalized scores with 95% confidence intervals (CIs) for each RL domain as a function of the learning step.

To summarise these results in one number, the average performance is 65.8% compared to JAT experts over four domains. This shows that JAT can mimic expert performance on a very wide variety of tasks. Let’s explain in a bit more detail:

For the Atari 57, the agent achieves 14.1% of the expert score, which is equivalent to 37.6% of human performance. It surpasses human performance in 21 games. In BabyAI, the agent achieves 99.0% of the expert score and cannot exceed 50% of the expert in one task. In the meta world, agents reached 65.5% of experts. In Mujoco’s case, the agent achieves 84.8% of the experts.

Human normalization scores of JAT agents on the Atari 57 benchmark.

Most impressive is that JAT achieves this performance using a single network for all domains. To measure this performance, let’s take a look at the rendering of JAT for several tasks.

Would you like to give it a try? you can! The JAT model is available at the 🤗Hub!

For text tasks, the model presents a basic feature, so we will introduce the paper to the reader for more information.

The surprising benefits of predicting observations

When training RL agents, the main goal is to maximize future rewards. But what happens if you ask your agent to predict what you’ll observe in the future? Does this additional task help or interfere with the learning process?

There are two dissenting opinions on this question. On the one hand, learning to predict observations can lead to a deeper understanding of the environment, leading to better and faster learning. On the other hand, agents may distract themselves from their main goals, resulting in mediocre performance in both observational and action predictions.

To resolve this argument, we performed using a loss function that combines observational and behavioral losses, using weighting parameters. $κ\ kappa$

Aggregated measurements with 95% CI for research into the effects of observed predictive learning for selected tasks. The presented results cover a range of selected kappa values and are based on 100 ratings per task. Best

κ\ kappa

The results were remarkable. when $κ\ kappa$

But we found sweet spots around $\ kappa=0.005$

Therefore, the next time you train your RL agent, consider asking them to predict what will be observed in the future. It may just improve your performance and speed up your learning!

Conclusion

This work introduced JAT, a multipurpose trans agent that can master a variety of sequential decision-making tasks and demonstrate basic functionality in NLP and CV tasks. For all these tasks, JAT uses a single network. Our contributions include the release of Expert RL agents, JAT datasets, and JAT models. We hope that this work will stimulate future research in the field of generalist agents and contribute to the development of more versatile and competent AI systems.

What’s next? Research Request

The JAT project believes it has opened up a new direction for research in the field of generalist agents, and has only damaged the surface. Here are some ideas for future work:

Data Improvement: Although pioneering, the JAT dataset is still in its early stages. Expert trajectories come from only one expert agent per environment, which can cause bias. We have tried our best to reach the latest performance, but some environments are still difficult. We believe it will be very helpful to collect more data and train more professional agents.

Using offline RL: JAT agents are trained using basic behavioral clones. This means two things: (1) Suboptimal trajectories cannot be used. (2) JAT agents cannot outperform experts. I chose this approach for simplicity, but I think that using offline RL will help improve agent performance, but it’s not too complicated to implement.

Maximize the possibilities of smarter multitasking sampling strategies: Currently, JAT agents sample data uniformly from all tasks, but this approach may be hindering it. Dynamically adjusting the sampling rate to focus on the most challenging tasks can enhance the agent’s learning process and unleash significant performance gains.

link

Quote

@article {allouedec2024jack, title = {{jack of all trades, some masters, multipurpose trans agent}}, author = {gallouédec, quentin and beecking, edcement and romac, clément and dellandréa, emmanuel}, Journal = {arxiv preprint arxiv:2402.09844} {2024}, url = {https://arxiv.org/abs/2402.09844}}}

versatileai

See Full Bio

What's Hot

Skip the time of llama generation with AWS reasoning 2

Google Cloud reveals how AI is restructuring cybersecurity defenses

The rapid rise of “AI Slop” videos flashes to social media platforms

Skip the time of llama generation with AWS reasoning 2

Google Cloud reveals how AI is restructuring cybersecurity defenses

SDXL in 4 steps with potential consistency lora

Understand the impact of top LLMs and AI on content creation — KHTS Radio — Santa Clarita Radio

Best AI Image Generation Bot Telegram

The UAE announces bold AI-led plans to revolutionize the law

Most Popular

Understand the impact of top LLMs and AI on content creation — KHTS Radio — Santa Clarita Radio

Best AI Image Generation Bot Telegram

The UAE announces bold AI-led plans to revolutionize the law

Don't Miss

Skip the time of llama generation with AWS reasoning 2

Google Cloud reveals how AI is restructuring cybersecurity defenses

The rapid rise of “AI Slop” videos flashes to social media platforms

Subscribe to Updates

What's Hot

Jacks of all trades, some masters, multipurpose trans agent

Datasets and Expert Policies

Experts’ Policy

JAT Data Set

JAT Agent Architecture

Experiments and results

The surprising benefits of predicting observations

Conclusion

What’s next? Research Request

link

Quote

Related Posts