Measuring progress towards AGI: A cognitive framework

To understand AI capabilities across these cognitive abilities, we propose a three-step evaluation protocol that benchmarks system performance in relation to human capabilities.

Evaluate AI systems across a wide range of cognitive tasks covering each ability using test sets maintained to prevent data contamination Collect human baselines for the same tasks from a demographically representative sample of adults Map the performance of each AI system relative to the distribution of human performance in each ability

From theory to practice

Defining these cognitive abilities is an important first step, but measuring progress requires more than a framework. To put this theory into practice, we’re launching a new Kaggle hackathon: Measuring Progress to AGI: Cognitive Capabilities. This hackathon encourages the community to design assessments for the five cognitive abilities with the largest assessment gaps: learning, metacognition, attention, executive function, and social cognition.

Participants can build and test ratings against a lineup of Frontier models using Kaggle’s newly launched community benchmarking platform.

We have a total of $200,000 in prizes up for grabs. The top two entries in each of the five tracks will receive a $10,000 prize, and the four best overall entries will receive a $25,000 grand prize. Submissions will be accepted from March 17th to April 16th and results will be announced on June 1st. Visit the Kaggle website to start building.

versatileai

See Full Bio

What's Hot

NHS AI blood test could reduce invasive uterine cancer testing

How to shrink your token budget without downsizing your team

Native-speed vLLM Transformer Modeling Backend

NHS AI blood test could reduce invasive uterine cancer testing

How to shrink your token budget without downsizing your team

Native-speed vLLM Transformer Modeling Backend

Physical AI Conference Held in San Jose as Robotics and Autonomous AI Go Mainstream

OpenAI Frontier collides enterprise AI agents with SaaS

Most Popular

Physical AI Conference Held in San Jose as Robotics and Autonomous AI Go Mainstream

OpenAI Frontier collides enterprise AI agents with SaaS

Don't Miss

NHS AI blood test could reduce invasive uterine cancer testing

How to shrink your token budget without downsizing your team

Native-speed vLLM Transformer Modeling Backend

Subscribe to Updates

What's Hot

Measuring progress towards AGI: A cognitive framework

From theory to practice

Related Posts