The ARC-AGI benchmark, considered a key indicator for Artificial General Intelligence (AGI), saw a significant performance jump in 2024. The best performance on the private evaluation dataset rose from 33 to 55.5 percent, but remains far from the target of 85 percent. The ARC-AGI benchmark, which measures the ability to solve entirely new tasks – as opposed to tasks a system can be prepared for – is considered particularly relevant for AGI research. Classical deep learning approaches, based on retrieving stored patterns, fail here.
In 2024, the ARC Prize was launched, a global competition designed to inspire new ideas and drive open progress toward AGI. However, the grand prize of $600,000 for reaching 85 percent was not awarded. The team MindsAI achieved the highest score of 55.5 percent but did not publish its code and was therefore ineligible for the prize. A total of 1,430 teams submitted 17,789 entries for the ARC Prize 2024. The competition organizers report at least seven well-funded startups that shifted their priorities to work on ARC-AGI and several large corporate labs that launched internal efforts to tackle ARC-AGI.
According to the ARC Prize 2024 technical report, three main approaches emerged: AI-powered program synthesis, test-time training (TTT), and combinations of both methods.
In program synthesis, systems use large language models to generate program code or guide the program search. Ryan Greenblatt achieved 42 percent by having GPT-4 generate and debug thousands of Python programs per task.
Test-time training adapts a pre-trained language model to the specific task at runtime. This approach, introduced by MindsAI, was adopted by many teams. The winning team "the ARChitects" achieved 53.5 percent with this method.
Teams that combined both approaches achieved the best results. Pure program synthesis or pure test-time training each only reach about 40 percent. A team from the Massachusetts Institute of Technology recently demonstrated in a paper how a language model combining both approaches achieved an accuracy of 61.9 percent on the public test of the ARC Prize. Due to computational constraints in the ARC Prize, which the team exceeded, this approach has not yet been tested on the private leaderboard, which contains 100 unpublished tasks. Results for OpenAI's complete o1 model, including the Pro mode, are also still pending, with no dramatic performance increase expected.
The organizers of the ARC Prize also announced plans to develop a new benchmark called ARC-AGI-2 for 2025. The current dataset from 2019 has several weaknesses: With only 100 tasks in the private evaluation dataset, it is too small, and thousands of evaluations carry the risk of overfitting. "We strive to evolve the ARC Prize from its experimental origins into an enduring North Star for AGI," the report states. The competition has already prompted at least seven well-funded AI startups to change their roadmaps and focus on solving the benchmark, the organizers write. Larger companies have also begun internal efforts to crack the benchmark.
The ARC Prize will continue annually until the benchmark is solved and a public reference solution is available. The organizers believe that the team that eventually develops AGI is already working on ARC-AGI today.
The progress in 2024 shows that algorithmic improvements can have a large impact and massive computing power is not necessarily required. Nevertheless: "New ideas are still needed to develop AGI. The fact that ARC-AGI has withstood five months of intense scrutiny with an outstanding grand prize of $600,000 and hundreds of thousands of dollars in additional prizes is strong evidence that the solution does not yet exist."
Bibliographie: https://arxiv.org/html/2412.04604v1 https://news.ycombinator.com/item?id=42343215 https://www.reddit.com/r/singularity/comments/1h8cp69/arc_prize_capitulates_agi_progress_is_no_longer/ https://arcprize.org/blog/arc-prize-2024-winners-technical-report https://arxiv.org/abs/2412.04604 https://twitter.com/0xWUT/status/1865160169702785155 https://www.linkedin.com/posts/james-bentley-1b329214_agi-progress-is-no-longer-stalled-the-activity-7271295033100664834-o4l9 https://twitter.com/arcprize https://news.ycombinator.com/item?id=40711484 https://www.podcastworld.io/episodes/shane-legg-deepmind-founder-2028-agi-new-architectures-align-oqwmbepj