To understand AI capabilities across these cognitive abilities, we propose a three-step evaluation protocol that benchmarks system performance in relation to human capabilities.
Evaluate AI systems across a wide range of cognitive tasks covering each ability using test sets maintained to prevent data contamination Collect human baselines for the same tasks from a demographically representative sample of adults Map the performance of each AI system relative to the distribution of human performance in each ability
From theory to practice
Defining these cognitive abilities is an important first step, but measuring progress requires more than a framework. To put this theory into practice, we’re launching a new Kaggle hackathon: Measuring Progress to AGI: Cognitive Capabilities. This hackathon encourages the community to design assessments for the five cognitive abilities with the largest assessment gaps: learning, metacognition, attention, executive function, and social cognition.
Participants can build and test ratings against a lineup of Frontier models using Kaggle’s newly launched community benchmarking platform.
We have a total of $200,000 in prizes up for grabs. The top two entries in each of the five tracks will receive a $10,000 prize, and the four best overall entries will receive a $25,000 grand prize. Submissions will be accepted from March 17th to April 16th and results will be announced on June 1st. Visit the Kaggle website to start building.