Contra ACT On The Extinction Tournament

In my last blog post, I discussed participating in a Forecasting Tournament through the University of Pennsylvania. One of the top rationalist blogs, Astral Codex Ten (ACT) wrote a post discussing findings from the tournament, specifically focusing on the issue of artificial intelligence (AI). The forecasting tournament focused on existential risks and asked about the chances that AI will cause a global catastrophe such that the world population would decrease to less than five thousand before the year 2100. The final estimates of the study put the probability of AI extinction at 0.38% for super-forecasters and 3% for domain experts in the study. This shocked ACT, because his current estimate is 33% and, another forecasting expert, Toby Ord’s was 16%. Participants in this study seemed to bracket the extinction risk of AI which in surveys is around 2%. The incentives in this study seem to have been aligned. Reciprocal scoring was used but participants did not convince each other very much. Participants in the study were incentivized to create their own forecasts but also to estimate the forecasts of other participants. Reciprocal scoring: Guessing what domain experts and super-forecasters would guess. Here I matched/predicted the general pattern of domain experts rating the choice of catastrophe higher than super-forecasters. This is down to bias; as you become an expert you simply know more about all the ways that things can go wrong. For instance, in arms control circles you spend a lot of time thinking about all the various ways that conflicts can escalate to nuclear war, through asymmetric information, new technology, unintended signalling, or even implausible views on adversaries’ strategy (ex. escalate to de-escalate). I was participating in this study as a subject matter expert in nuclear weapons risks, so I tried to be very critical of any increases made to my percentages in this area. In contrast to experts, super-forecasters are usually stuck thinking at a higher strategic level, weighing risks against each other and aggregating across risks to establish an overall extinction probability that reasonably conforms to the historical record. This was the general pattern in all major forecasting areas except for AI. There it seemed that experts had the absolute highest risk estimates but the variance was so incredibly high that the group average could be drastically swayed by including a few more pessimists or optimists on AI. Were the participants of this study, including me, just stupid? Not likely, we got probabilities for things that are easily quantifiable correct, like the number of WHO pandemics to be declared in the century and there is very little variance in those estimates across all study participants. On each question, participants were asked to create a written record of their reasoning for each forecast and the breadth of knowledge for participants across multiple domains was extremely impressive. Another possible explanation: participants were out of their depth on AI. What is unclear is why AI is something fundamentally more difficult to understand than topics like pathogens or climate change? There is a quote from a participant saying that he did not notice experts with substantial machine learning (ML) experience and that these people with little ML experience had very low estimates of AI danger. I was one of the low estimators for AI safety but I have also built deep learning applications, so it isn’t clear to me that AI pessimism is correlated with more AI experience. Of course, for most AI experts, ML programming is not what much of this is about. Rather it is primarily thought experiments and debates about whether AI development should be paused. For both ends of the risk spectrum it is difficult to quantify or explain deeply where the divide between a safe and unsafe system might be. I am aware of many of the big AI safety concepts like the stop button problem and sandbox escape. In this study it did not seem that ML experience or AI safety familiarity cause any more of a mis-forecast than would be possible in other areas, rather differences here would be more fundamental. We should remember the high bar for any extinction level event; reducing the global population below five thousand, AST acknowledges the difficulty of meeting this criteria but my mind does not sufficiently dwell on this. The lesser criteria, a global catastrophe where >10% of the world’s population is killed within a five-year period, this has never happened in recorded history. Back to the extinction criteria, perhaps AI is able to stack the nuclear and pathogenic risks but that falls far short of ACT’s 33%. I think the most likely path for AI extinction is by convincing different state actors to do their bidding. It is worth thinking about whether there would be an AI sophisticated enough to convince enough people to embark on an extinction campaign but not smart enough to cover its track in a way that would ultimately make the question unresolvable – this is an impractical quibble but again, convincing in my mind is the greatest near term pathway to an extinction risk. ACT rightly points out that the tournament was unaware of ChatGPT but consider how convincing it is. Most people have had the experience of asking a Large Language Model about something in their own area of expertise and while the answers sound and are confident, they are often slightly wrong in a way that should decrease your faith in answers outside your area of expertise. When talking to other people, there is a general assumption that the other party is striving for a degree of correctness in their information and, if they transgress this norm regularly, they are relegated to a low social status. ChatGPT is striving to give an answer that you would like, the reward function is a sycophantic mirror to our own vanity. Again, I circle back to the mechanics of extinction. During the tournament, I asked those with very high probabilities for AI extinction, what the means might be. I received quite flippant responses, to paraphrase one; ‘AI can do anything, maybe it will build a prion factory and distribute them globally’. But here, as always, the science fiction imagination necessary for this tournament must run up against the practical realities of these possibilities. What conditions would need to be met for the construction of a prion factory? Robotics are not sufficiently developed to build this factory without human intervention. It should also be noted that the development of more advanced robotics is not short-term limited by intelligence but our ability to manufacture the high-frequency hardware that will ultimately improve robotics. So, the factory would most likely need to be built by humans and would need to convince those with some expertise in prions to help. I don’t personally know a statistically significant sample of these experts, but those I do know, actively advocate for livestock in suspected outbreaks to be encased in concrete and treated most similarly to nuclear waste. Moving these people into the frame of ‘let’s build a factory with the capacity to poison the world’ seems like a probability so close to zero that it isn’t worth quantifying. It is worth thinking about the limits of persuasion, scammers are in a fully incentivized evolutionary battle but most of us will go through our lives without falling victim to social engineering. Even the most convincing possible incarnation of AI would have many people who will not participate – in the same way, that there are people who still get by without an email address. [Will AI be able to successfully navigate bribing the contractor who is pouring the foundation for the prion factory?] Any extinction risk above 10% really requires your complete devotion if you think you can bring the risk down in a meaningful way. This goes beyond blog posts, the consequences are dire enough that you should be leading armed raids on TPU facilities. “Instead of a revolution, we write a manifesto and have a lovely evening” – Kierkegaard. What are the intermediate steps to AI extinction? Will we see a year of >15% global GDP growth? Or that growth rate in the IS? Will there be a period where we successfully deduce that AI is killing/ having people killed? These intermediate steps would go a long way to help us determine whether we are in the 30% or 0.0003% world. Ultimately, this wide variance in estimates tells us something that we already know. There is a large amount of uncertainty around new technology. If you had run a similar forecasting tournament when nuclear weapons were first used, it would have been extremely unclear how many countries would have developed nuclear weapons within the next ten years, whereas that same question was asked in this forecasting tournament out to 2100, here there was not a large variance in estimates because the technological steps necessary to create a nuclear weapon are well understood. New technologies will always induce big disagreements about the risks and rewards because not only is the future direction of the technology understood but when innovation is fastest, the steps required to create the current level of technology are uncertain. As a tangible example, when I was working on deep learning applications several years ago, the input space was relatively fixed. If you wanted a paragraph of (poor quality) text to be generated from a prompt, the prompt length would be fairly constrained, say something like 15 tokens. The newest models are multi-modal and can take inputs of different lengths, and even more impressively they can take inputs of different types; text and images, I am sure short videos will be coming soon as well.

Leave a comment