The world continues to face conflict over whether we should fear or celebrate artificial intelligence, but AI still seems to be struggling with certain tasks that humans find incredibly easy. New research shows that AI struggles to read analog clocks and calendars.
You can also write code, generate realistic images, create text for human sounds, and pass exams (this will help, but in many ways).
The findings were presented by Scottish researchers at the International Conference on Learning Expression (ICLR) in 2025. Scotland announced its survey results on its preprint server ARXIV on March 18th.
Lead author Rohit Saxena, a researcher at the University of Edinburgh, has issued a statement on the findings, saying the obstacles need to be corrected.
“Most people can communicate time and use calendars from an early age. Our findings highlight the huge gap in AI’s ability to implement very basic skills for people,” she said.
“If AI systems are successfully integrated into time-sensitive real-world applications such as scheduling, automation, and assistive technologies, these shortcomings need to be addressed.”
Researchers investigated the timekeeping capabilities of AI by feeding custom datasets of clock and calendar images into various multimodal leading language models (MLLM). They used models of Meta’s Llama 3.2-Vision, Anthropic’s Claude-3.5 Sonnet, Google’s Gemini 2.0, and Openai’s GPT-4o.
What resulted was the unfortunate performance of AI, as the model failed to identify the correct time from the clock or day of the week image of the sample date.
Researchers have an explanation as to why ai’s is so poor to read time.
Saxena said, “The early systems were trained based on labeled examples. Reading the clock requires something different – spatial reasoning.”
“This model needs to detect overlapping hands, measure angles, and navigate a variety of designs, such as Roman numerals and stylized dials. AI recognizes that ‘This is a clock’ is easier than actually reading. ”
Not only did the time read, but the date was just as much a problem. When asked by AI, “How many days will the 153rd day be?”, the failure rate was similarly high. The results are compared as follows: The AI system reads the clock correctly, and only reads 26.3% of the calendar.
“Arithmetic is trivial to traditional computers, but not trivial to large-scale language models. AI does not run mathematical algorithms. It predicts output based on patterns seen in training data,” says Saxena.
“So, while we may be able to answer arithmetic questions correctly, the reasoning is inconsistent and not rule-based, and our work highlights the gap.”
This research project is the latest in a series of research that highlights the differences in understanding between AI and human methods.
AI models work best when you get answers from familiar patterns and have sufficient examples in your training data, but often fail when you need to use abstract inference.
“For us, reading a watch can be very difficult for them, and vice versa,” Saxena said.
“AI is powerful, but when tasks combine perception with accurate inference, they require strict testing, fallback logic, and often people in the loop.”
Clearly a lot of research is needed to truly unlock the possibilities of artificial intelligence.