Who can predict the evolution of artificial intelligence?

Hypermind invites you to participate in a grand forecasting contest on the evolution of artificial intelligence with a $30,000 prize over four years 

The last decade has revealed the possibility of artificial intelligence. Whether it’s driving cars, analyzing X-rays, recognizing faces, playing a game of Go, translating text, or composing a melody, machines have become world champions. Few are the domains where the brain still dominates – for how long? – its digital rival. But today’s “AIs” remain confined to specific tasks. The quest for the grail of so-called “general” intelligence, i.e. similar to ours, has only just begun.

The technical challenge is immense and on a par with the geopolitical and industrial stakes. For whoever masters the secrets of intelligence first will dominate the world, and beyond. Who of China or the USA, dictatorships or the free world, will arm themselves faster with the genius of machines? And can we still imagine that a newcomer may challenge AI’s established players in the future, as Steve Jobs’ Macintosh once led the insurrection against the all-powerful IBM?

These are the complex and exciting topics that the “Arising Intelligence” forecasting contest invites you to ponder. The questions are put to you by Professor Jacob Steinhardt of the artificial intelligence research lab at UC Berkeley (BAIR). The contest is supported by Open Philanthropy.

Hypermind interviews

Professor Jacob Steinhardt

Designer of the forecasting contest; member of UC Berkeley’s Artificial Intelligence Research Laboratory

Jacob Steinhardt
Jacob Steinhardt

What motivated you to design this forecasting contest?

Jacob Steinhardt: We found the results of the AI-2030 forecasting contests exciting and informative, but there were certain questions that we as researchers were interested in that these previous contests didn’t fully touch on. These generally regarded benchmarking capabilities that many people currently believe to be difficult, and questions around industry and government investment and geopolitical implications.

How did you select which questions to ask ?

Some questions were picked to highlight difficult AI tasks where there is not yet a clear trend for progress. Thus, compared to some forecasting exercises where there is a clear trend line to start from, these might be more difficult and require more ingenuity, but will also be more informative. The more geopolitical questions are also interesting, as they ask about established players vs. newcomers, and whether China will overtake the U.S. Thus, in some sense many questions are forecasting the degree of “surprise” we should expect in the future.

Who might find the forecasts useful, and why?

The forecasts will inform my own estimates of the future pace of progress in AI. In addition, I am interested in using machine learning combined with humans to help build better forecasting systems. This contest will help establish a track record for human performance, especially on longer time horizons.

Is it weird to ask collective intelligence to forecast progress in AI?

I don’t think so. In fact, I think collective intelligence and AI are naturally synergistic. Although even without the synergy, forecasting techniques should be applicable to AI, just like many other domains.

Read prof. Steinhardt’s own analysis of the results on his blog.

Prize money time-table:


The contest features the six questions below at four time horizons: mid-‘22, mid-‘23, mid-’24 and mid-’25. They will be added to the contest one by one on a weekly basis.

By mid-202X, which of China or the United States will have conducted the largest machine learning experiment, as measured by the amount of computing power brought to bear? 

By mid-202X, what is the most computing power that will have been used by a machine learning experiment not conducted by China or by Google, Facebook, Microsoft, Deepmind or OpenAI?

By mid-202X, how well will an AI perform on the MATH problems test?

By mid-202X, how well will an AI perform on a massively multitasking language comprehension test?

By mid-202X, how well will an AI perform on the Something Something V2 video action identification test?

By mid-202X, how well will an AI perform at recognizing images that have been  deliberately altered to be misleading?