Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

Accenture and Anthropic partner to power enterprise AI integration

December 10, 2025

Fal secures $140 million to power real-time AI-generated content

December 9, 2025

A lightweight mathematical reasoning agent using Smolagent

December 9, 2025
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Wednesday, December 10
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources
Versa AI hub
Home»Research»Study finds NYT Connections game beats best AI models
Research

Study finds NYT Connections game beats best AI models

By November 21, 2024No Comments2 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
Share
Facebook Twitter LinkedIn Pinterest Email

A study conducted by Tuhin Chakrabarty, assistant professor of computer science at Stony Brook University, and a team of researchers at Columbia University shows that the New York Times word game “Connections” may serve as a challenging benchmark for large-scale language training. It turns out that there is. Models of Abstract Reasoning (LLM).

AI and machine learning regularly beat the world’s best chess players, but when it comes to ‘connections’ even the best LLM, Claude 3.5 Sonnect, can only fully solve the game 18% of the time. I found out through research. The study investigated AI responses to over 400 Connections games and found that both novice and expert players outperformed the AI ​​at solving puzzles.

In the game, players are presented with a 4×4 grid containing 16 words. The task is to group these words into four clusters of four words according to their common characteristics. For example, the words “believer,” “sheep,” “doll,” and “lemming” form a group because they can be classified as “conformists.”

To classify words into appropriate categories, players must be able to reason using various forms of knowledge, from semantic knowledge (about “fits”) to encyclopedic knowledge.

Tuhin Chakrabarty
Tuhin Chakrabarty

“This may seem easy to some, but many of these words can easily be placed into several other categories,” Chakrabarty says. “For example, ‘likes’, ‘followers’, ‘shares’, ‘insults’, etc., may be classified as ‘social media interactions’ at first glance.” These possible groupings are dangerous information. It will be. The game is designed with this in mind. That makes it even more interesting.

In this study, LLM is relatively good at inferences involving semantic relations (“happy,” “joyful,” “enjoyable”), but at multi-word expressions (“kick the bucket” is “die”), A combination of word form and word meaning knowledge (adding the prefix “un-” to the verb “do” creates the word “undo” with the opposite meaning).

In this study, we used five LLMs (Google’s Gemini 1.5 Pro, Anthropic’s Claude 3.5 Sonnet, OpenAI’s GPT4 Omni, Meta’s Llama 3.1 405B, and Mistral Large 2 (Mistral-AI, 2024)) in 438 NYT Connections games. We tested it and compared the results to human performance. In a subset of these games. The results showed that while all LLMs were able to partially solve some games, “performance was far from ideal.”

Read the full article on the AI ​​Innovation Institute website.

author avatar
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleAI-powered platform revolutionizes access to government minutes
Next Article Working backwards from AI business value generation in the public sector

Related Posts

Research

New AI research clarifies the origins of Papua New Guineans

July 22, 2025
Research

AI helps prevent medical errors in real clinics

July 22, 2025
Research

No one is surprised, and a new study says that AI overview causes a significant drop in search clicks

July 22, 2025
Add A Comment

Comments are closed.

Top Posts

New image verification feature added to Gemini app

December 7, 20256 Views

Aluminum OS is the AI-powered successor to ChromeOS

December 7, 20255 Views

UK and Germany plan to commercialize quantum supercomputing

December 5, 20255 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

New image verification feature added to Gemini app

December 7, 20256 Views

Aluminum OS is the AI-powered successor to ChromeOS

December 7, 20255 Views

UK and Germany plan to commercialize quantum supercomputing

December 5, 20255 Views
Don't Miss

Accenture and Anthropic partner to power enterprise AI integration

December 10, 2025

Fal secures $140 million to power real-time AI-generated content

December 9, 2025

A lightweight mathematical reasoning agent using Smolagent

December 9, 2025
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2025 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?