Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

Creating innovative content at your fingertips

July 4, 2025

The UK and Singapore form an alliance to guide AI into finance

July 4, 2025

StarCoder2 and Stack V2

July 4, 2025
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Friday, July 4
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
Versa AI hub
Home»Research»Study finds NYT Connections game beats best AI models
Research

Study finds NYT Connections game beats best AI models

By November 21, 2024No Comments2 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
Share
Facebook Twitter LinkedIn Pinterest Email

A study conducted by Tuhin Chakrabarty, assistant professor of computer science at Stony Brook University, and a team of researchers at Columbia University shows that the New York Times word game “Connections” may serve as a challenging benchmark for large-scale language training. It turns out that there is. Models of Abstract Reasoning (LLM).

AI and machine learning regularly beat the world’s best chess players, but when it comes to ‘connections’ even the best LLM, Claude 3.5 Sonnect, can only fully solve the game 18% of the time. I found out through research. The study investigated AI responses to over 400 Connections games and found that both novice and expert players outperformed the AI ​​at solving puzzles.

In the game, players are presented with a 4×4 grid containing 16 words. The task is to group these words into four clusters of four words according to their common characteristics. For example, the words “believer,” “sheep,” “doll,” and “lemming” form a group because they can be classified as “conformists.”

To classify words into appropriate categories, players must be able to reason using various forms of knowledge, from semantic knowledge (about “fits”) to encyclopedic knowledge.

Tuhin Chakrabarty
Tuhin Chakrabarty

“This may seem easy to some, but many of these words can easily be placed into several other categories,” Chakrabarty says. “For example, ‘likes’, ‘followers’, ‘shares’, ‘insults’, etc., may be classified as ‘social media interactions’ at first glance.” These possible groupings are dangerous information. It will be. The game is designed with this in mind. That makes it even more interesting.

In this study, LLM is relatively good at inferences involving semantic relations (“happy,” “joyful,” “enjoyable”), but at multi-word expressions (“kick the bucket” is “die”), A combination of word form and word meaning knowledge (adding the prefix “un-” to the verb “do” creates the word “undo” with the opposite meaning).

In this study, we used five LLMs (Google’s Gemini 1.5 Pro, Anthropic’s Claude 3.5 Sonnet, OpenAI’s GPT4 Omni, Meta’s Llama 3.1 405B, and Mistral Large 2 (Mistral-AI, 2024)) in 438 NYT Connections games. We tested it and compared the results to human performance. In a subset of these games. The results showed that while all LLMs were able to partially solve some games, “performance was far from ideal.”

Read the full article on the AI ​​Innovation Institute website.

author avatar
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleAI-powered platform revolutionizes access to government minutes
Next Article Working backwards from AI business value generation in the public sector

Related Posts

Research

In the midst of intense AI talent races, Meta’s active recruitment target open-rai researcher

June 30, 2025
Research

Lossless compression tailored to AI

June 30, 2025
Research

High-tech research jobs in the US will rise by 26% by the next decade. Median future salary for AI, ML and others is $140,000

June 30, 2025
Add A Comment
Leave A Reply Cancel Reply

Top Posts

New Star: Discover why 보니 is the future of AI art

February 26, 20252 Views

Impact International | EU AI ACT Enforcement: Business Transparency and Human Rights Impact in 2025

June 2, 20251 Views

Presight plans to expand its AI business internationally

April 14, 20251 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

New Star: Discover why 보니 is the future of AI art

February 26, 20252 Views

Impact International | EU AI ACT Enforcement: Business Transparency and Human Rights Impact in 2025

June 2, 20251 Views

Presight plans to expand its AI business internationally

April 14, 20251 Views
Don't Miss

Creating innovative content at your fingertips

July 4, 2025

The UK and Singapore form an alliance to guide AI into finance

July 4, 2025

StarCoder2 and Stack V2

July 4, 2025
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2025 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?