the study
Author published on July 23, 2025
Aeneas Team
We present the first model for contextualizing ancient inscriptions designed to help historians better interpret, attribute and restore fragmentary texts.
Writing was everywhere in the Roman world. It is engraved in everything from imperial monuments to everyday objects. From political doodles, love poems, inscriptions to business transactions, birthday invitations and magical spells, inscriptions provide modern historians with rich insight into the diversity of everyday life throughout the Roman world.
In many cases, these texts are fragmentary, weathered, or intentionally stained. Especially when comparing similar inscriptions, it is almost impossible to restore, date, or place them without contextual information.
Today we publish a paper in nature presenting Aeneas, the first artificial intelligence (AI) model for contextualizing ancient inscriptions.
When dealing with ancient inscriptions, historians traditionally rely on expertise and specialized resources to identify “similarities.” This is a text that shares similarity of wording, syntax, standardized expressions, or sources.
Aeneas greatly accelerates this complex and time-consuming task. Across thousands of Latin inscriptions, there is a reason to get text-context similarities in seconds that allow historians to interpret and construct the findings of the model.
Our model can also help to expand its capabilities, from other ancient languages, scripts, media, and papiari to money, to draw connections to a wider range of historical evidence.
It was jointly developed Aeneas with the University of Nottingham and collaborated with researchers from Warwick University, Oxford and the University of Athens Economics and Business (AUEB). This work was part of a broader effort to explore how generative AI can help historians better identify and interpret large-scale similarities.
We hope that this research will benefit as many people as possible, so we have created an interactive version of Aeneas at Predingthepast.com that is freely available to researchers, students, educators, museum experts and more. We openly source code and datasets to support further investigation.
Advanced Features of Aeneas
Named after the wandering hero of the Graeco-Roman mythology, Aeneas is based on Ithaca, a previous work that used AI to restore, date and place ancient Greek inscriptions.
Aeneas goes a step further, helping historians interpret and contextualize texts, give meaning to isolated pieces, draw richer conclusions, and bring together a better understanding of ancient history.
Advanced features of the model include:
Parallels Search: Search for similarities across a vast collection of Latin inscriptions. By turning each text into a kind of historical fingerprint, Aeneas identifies deep connections that help historians locate the inscription within a wider historical context. Processing Multimodal Inputs: Aeneas is the first model to determine the geographical origin of text using multimodal inputs. Analyze both text and visual information, like images of inscriptions. Rest unknown length gaps: For the first time, Aeneas can restore gaps in text that have unknown lengths. This makes it a more versatile tool for historians working with significantly damaged materials. State of the Art Performance: Aeneas sets new, cutting-edge benchmarks that restore damaged text and predict when and where it was written.
Animation of a restored bronze military diploma for Sardinia 113/14 CE (CIL XVI, 60).
How Aen works
Aeneas is a multimodal-generated neural network that captures inscription text and images as input. To train Aeneas, we curated large and reliable datasets from decades of work by historians, particularly to create the epigraphic database Roma (EDR), the Epigraphic Database Heidelberg (EDH), and the Epigraphic Database Clauss Slaby (EDCS-ELT).
These records were cleaned, harmonized, and linked into a single machine-actionable dataset called the Latin Epigraphic Dataset (LED), containing over 176,000 Latin inscriptions from the entire ancient Roman world.
Our model uses a trans-based decoder to process text input for inscriptions. Specialized networks use text to handle character restoration and dating, while geographical attributes also use images of inscriptions as input. The decoder gets similar inscriptions from LEDs ranked by relevance.
For each inscription, Aeneas’ contextualization mechanism uses a technique called “embedding” to obtain a list of similarities. It encodes the text and contextual information of each inscription into a kind of historical fingerprint that contains details of what the text says, its language, where it came from, and how it relates to other inscriptions.
A diagram of Aeneas’ architecture that shows how a model captures text and image inputs to generate state, date and repair predictions.
Cutting-edge performance
Aeneas groups inscriptions by dates written much more clearly than other general purpose models that are also trained in Latin, as shown in the visualization below.
Uniform manifold approximation and projection (UMAP) visualization showing historically rich embedding time series attributes compared to text embeddings in general large-scale linguistic models.
Aeneas recovers damaged inscriptions with top 20 accuracy of 73% with a gap of up to 10 characters. This decreases to 58% only if the repair length is unknown. It is a very challenging task in itself. It also presents the inference in an interpretable way, providing a salience map in which some of the inputs influenced the prediction. Thanks to the use of visual data, our model can attribute the inscription to any of the 62 ancient Roman states with a 72% accuracy. For dates, Aeneas will place the text within 13 years from the date range provided by the historian.
New lens on historical debate
To test Aeneas’ capabilities in an ongoing research debate, we have made it one of the most famous Roman inscriptions.
Historians have long narrowed down the dates of this inscription. Rather than predicting a single fixed date, Aeneas produced a detailed distribution of possible dates, showing two different peaks, with small peaks between 10-10-20 BC and a larger confident peak between 10-20 BC. These results quantitatively captured both general dating hypotheses.
A histogram showing Aeneas time series attribution predictions of Res Gestae’s Aeneas, which models an academic discussion about dating this famous inscription.
Based on its predictions, Aeneas is based on subtle language features and historical markers such as official titles and monuments mentioned in the text. By turning dating questions into stochastic estimates based on linguistic and contextual data, our model offers a new quantitative method of engagement in long-standing historical debates.
Most importantly, Aeneas also recovers many related similarities from the Imperial legal texts linked to Augustus’ heritage, highlighting how Imperial ideology was reproduced across media and geography.
Collaboratively promote historical research
A large-scale historian collaborative research was conducted with AI to assess the impact of Aeneas as a research aid. We invited 23 historians to regularly work using inscriptions to restore, date and place a set of text using Aeneas.
The assessments summarized in the table below show how the most effective results were achieved when historians used Aeneas contextual information, along with predictions for historians to restore and attribute Roman inscriptions.
Using 60 inscriptions from the database test set, the historians perform on three epigraphic tasks (repair, geographical attribution, dating). The tasks were first performed independently, then Aeneas similarity, or similarity and prediction were performed together.
Aeneas has helped historians identify new similarities in our study and increase their confidence as they tackle complex epigraphic tasks. Historians have consistently emphasized the value of Aeneas in accelerating his work and expanding the scope of the most relevant parallel inscriptions.
“
The similarities of Aeneas completely changed my perception of the inscription. I noticed the details that made all the difference to restore the text and attribute it to timeline.
Anonymous Historian from our study
Sharing tools and shaping the future
Aeneas is designed to integrate into historians’ existing research workflows. By combining expertise and machine learning, it opens up collaborative processes and provides interpretable suggestions that serve as a valuable starting point for historical research.
As part of today’s release, we upgrade the ancient Greek model Ithaca to drive energy, including contextualization functions, repairs unknown lengths, and improves overall performance.
They also co-designed a new educational syllabus to bridge technical skills with historical thinking in the classroom. This syllabus is in line with the AI literacy initiative, including the European Commission’s Digital Competency Framework for Citizens (DIGCOMP 2.2), the UNESCO AI Competency Framework for Students, and the European Commission and the Organization for Economic Co-operation and Development (OECD) AILIT Framework.
The Aeneas team continues to partner with experts on a diverse subject, using Aeneas to help shed light on our ancient past.
Learn more about Aeneas
Acknowledgments
This study was co-led by Yannis Assael and Thea Sommerschield.
Contributors include Alison Cooley, Brendan Shillingford, John Pavlopoulos, Priyanka Suresh, Bailey Herms, Jonathan Prag, Alex Mullen and Shakir Mohamed. The Aeneas web interface was developed by Justin Grayston, Benjamin Maynard, and Nicholas Dietrich and is powered by Google Cloud.
The syllabus was developed by Rob Urgaard of Sint Lievenskollage in Ghent, Belgium.