🧠Model: https://huggingface.co/collections/allenai/olmoearth | 📄 Technical report: https://allenai.org/papers/olmoearth_v1_1 | 💻 Code: https://github.com/allenai/olmoearth_pretrain

We released OlmoEarth (v1) in November 2025. Since then, partners have applied OlmoEarth (v1) to a wide range of tasks, from tracking mangrove change to classifying the causes of forest loss, to creating country-scale crop type maps in days, to expanding deployment to countries, continents, and global regions. Each release brings us closer to our mission of providing cutting-edge AI to organizations and communities working to protect people and the planet.
Efficiency shapes what’s possible when OlmoEarth processes satellite imagery to make predictions that range from tens of thousands of square kilometers to hundreds of thousands of square kilometers. Throughout OlmoEarth’s execution lifecycle (data export, preprocessing, inference, and postprocessing), the compute costs are overwhelming. A more efficient model means we can support more partners on the OlmoEarth platform, and anyone running OlmoEarth themselves can take advantage of this technology faster and at lower cost.
That’s why we built OlmoEarth v1.1. This is a new family of models that maintains the performance of OlmoEarth v1 on a combination of research benchmarks and tasks built with our partners while reducing compute costs by up to 3x.
Reduce sequence length and increase efficiency
OlmoEarth models are transformer-based models and are one of the leading architectures in machine learning today. To process remote sensing data, first convert it into a sequence of tokens that the model can ingest.
Two important factors that control the efficiency of transformer-based models are model size (this is why we release families of models so that users can choose the size that fits their computing budget) and the length of the token sequence. Computational costs increase quadratically with the length of the token sequence, so even small reductions can significantly reduce model execution costs.

MAC, or multiply-accumulate operation, estimates the computations required for one forward pass of the model. In general, a lower MAC means cheaper and faster inference. The y-axis is inverted because the lower the average rank, the better. The label shows the model family and size. All plotted points use the pasted MAC/Rank values.
Token design
This raises an important question: what should the token represent for a transformer-based remote sensing model?
Consider Sentinel-2 images, a common modality that we process. The Sentinel-2 input will be a tensor with height and width (H, W represent latitude and longitude pixels), time dimension T, and 12 Sentinel-2 channels ((H, W, T, D=12)).

We are currently splitting the data into resolution-based patches. Specifically, this means choosing a spatial patch size p and dividing the entire Sentinel-2 image into patches of size pxp.

Create tokens for each patch, each timestep, and each resolution. Therefore, a Sentinel-2 input with 2 timesteps will produce 6 tokens per patch (2 timesteps x 3 resolutions, 10m, 20m, and 60m).
In total, a(H, W, T, D=12) Sentinel-2 input produces H/px W/px T x 3 tokens.
When processing Sentinel-2 data, a common practice is to use unique tokens for each resolution. Galileo and SatMAE both take this approach, and SatMAE shows significantly better results when it does. However, this is not universal. CROMA is a model that uses only one token for all bands, regardless of resolution. The number of tokens increases multiplicatively, so collapsing the resolution to a single token reduces the tokens by a factor of three, saving material across pre-training, fine-tuning, and inference.
Simply combining tokens in this way leads to significant performance degradation, such as a 10 ppt drop on m-eurosat kNN (a common benchmark task for remote sensing models). We hypothesize that separating Sentinel-2 bands into different tokens will make it easier for OlmoEarth to model important cross-band relationships.
We had to change the pre-training plan to combine tokens without impacting performance. These changes are detailed in the paper.
For developers
The result is a family of models that do more with less. At any size, OlmoEarth v1.1 runs up to 3x cheaper than OlmoEarth v1, making frequent planet-scale map updates more affordable for all teams running OlmoEarth. If you are using models from the original OlmoEarth family, try OlmoEarth v1.1. It provides similar performance to OlmoEarth v1 while requiring a third of the compute, but with some setbacks (see technical report for details). If it works well for your task, you should see significant speedups during fine-tuning and inference.
For researchers
Pre-trained remote sensing models have many degrees of freedom, making them difficult to study. When performance changes, is it the architecture, the dataset, or the pre-training algorithm?
Because we are training OlmoEarth v1.1 on the same dataset as OlmoEarth v1, the two differences separate the impact of the methodology change. We hope this will advance our understanding of the scientific principles behind pre-training models for remote sensing.
Let’s get started
Check out the weights and training code for OlmoEarth v1.1, including weights for Base, Tiny, and Nano models.

