What if Humans Weaponize Superintelligence, w/ Tom Davidson, from Future of Life Institute Podcast

From: The Cognitive Revolution podcast

00:00:00

Hello, and welcome back to The Cognitive Revolution. Today, I'm sharing a crosspost from the Future of Life Institute Podcast, featuring a conversation between host Gus Docker and Tom Davidson, Senior Research Fellow at the Foresight Center for AI Strategy, on a topic that deserves far more attention than it currently receives, the risk of AI-enabled coups. This crosspost came about after I listened to Tom's appearance on the 80,000 Hours podcast, which was also excellent.

00:00:28

I was planning to do my own original follow-up interview, but for the second time recently, Gus beat me to it, and as always, he did an excellent job. So I thought I could save Tom some time by crossposting, and also felt that this was the perfect episode to follow on our most recent one on AI whistleblower protections and support.

00:00:45

At a high level, Tom's analysis is a sort of reframing of the risk that humanity could lose control to AI systems. Historically, lots of AI safety theorists have worried about scenarios in which AI systems rise up against or otherwise supplant humans as the primary architects of the future. This is a possibility that I have always taken seriously, even when it seemed unlikely. But as you'll hear, Tom shifts the focus to a highly related problem that on reflection does seem almost strictly more likely, at least in the near term.

00:01:15

The use of increasingly powerful AIs by human actors to consolidate power in ways that would have been impossible with previous technologies and which could prove similarly devastating. Importantly, Tom emphasizes early in the conversation that he does not think that anyone at leading frontier AI companies are explicitly planning an AI-enabled coup today. Rather, the risk emerges from the interaction of powerful incentives, rapidly advancing capabilities, and the natural human tendency to want more influence to achieve one's goals.

00:01:45

Step by step, without any single flagrantly malicious decision, we could find ourselves in a world where the traditional checks and balances of democratic society have been quietly circumvented by those with exclusive access to transformative AI. These sorts of possibilities are more familiar and therefore perhaps less entertaining to imagine and debate. But the very real historical precedent for humans using new technologies to concentrate power is a strong reason to take this concern super seriously as well.

00:02:13

As you'll hear, Tom walks through the specific capabilities that would enable these scenarios. AI systems that match human leaders in persuasion and strategy, superhuman cyber attack capabilities, and fully autonomous military robots that outperform human warfighters. He then also segments the threat landscape into three distinct models. First, singular loyalties, where AI systems deployed in government and military roles are made explicitly loyal to individual leaders rather than institutions.

00:02:43

Or the law. Second, secret loyalties, where backdoors or hidden allegiances are embedded in AI systems that appear to serve legitimate purposes. And third, exclusive access, where a small group gains control of dramatically more powerful AI capabilities than anyone else has. One scenario that Tom describes in detail is that of a US-based AI company integrated into the military developing sleeper agents.

00:03:09

Those are AI systems that behave normally until triggered to act on hidden loyalties at a critical moment. And then, if that's not all scary enough, there's the possibility that AIs could automate AI research itself, which in the most extreme case could allow an AI company to go from market leader to global hegemon by converting a small initial lead into a decisive strategic advantage.

00:03:31

Throughout the conversation, Tom grounds these scenarios in historical precedent, from traditional military coups to recent patterns of democratic backsliding in countries like Venezuela and Hungary. He notes that the US has seen increasing polarization, erosion of democratic norms, and concentration of executive power, all trends that AI could dramatically amplify.

00:03:51

And of course, one can't miss that the presidents of both Russia and China wield extremely concentrated power already and appear likely to do so for as long as they remain individually capable. Tom's assessment is that there's roughly a 10% chance of an AI-enabled coup in the next 30 years, up from a baseline of perhaps 2% without AI. And he sees this risk as being concentrated in the period when AI becomes extremely powerful, but before we've had the chance to develop robust governance structures.

00:04:20

Which, if one listens to the likes of Dario, Sam Altman, and Demis, could be coming quite soon indeed, such that decisions being made today about AI development and deployment could determine whether these scenarios ultimately come to pass. The mitigations Tom proposes amount to a defense in-depth strategy.

00:04:37

System integrity measures to prevent secret loyalties, requirements for distributed control of military AI systems, transparency requirements for frontier AI development, and establishing clear rules that AI systems should follow the law rather than individual commands. He also suggests that as we hand off more government and corporate functions to AI, we could perhaps program these systems to actively maintain democratic checks and balances, potentially making future societies more resistant to coups than today's.

00:05:07

There is a lot more here, and I really think it's worth giving all these possibilities a serious ponder, particularly as a counterpoint to those who have worried about the dangers of open source models. I take those issues super seriously, too, but this conversation convinced me that we need to start taking concentration of power scenarios just as or even more seriously, while the window for establishing norms and safeguards still remains open.

00:05:30

Now, here's Gus Docker's conversation with Tom Davidson of the Foresight Center for AI Strategy from the Future of Life Institute podcast. It's in everyone's interest to prevent a coup. Currently, no one small group has complete control. If everyone can be aware of these risks and aware of the steps towards them and kind of collectively ensuring that no one is going in that direction, then we can all kind of keep each other in check.

00:05:57

So I do think, in principle, the problem is solvable. You should always have at least a classifier on top of the system, which is looking for harmful activities and then kind of shutting down the interaction if something harmful is detected. We could program those AIs to maintain a balance of power.

00:06:13

So rather than handing off to AIs that just follow the CEO's commands or AIs that follow president's commands, we can hand off to AIs that follow the law, follow the company rules, report any suspicious activity to various powerful human stakeholders. And then by the time things are going really fast, we've already kind of got this whole layer of AI that is maintaining balance of power. Welcome to the Future of Life Institute podcast.

00:06:40

My name is Gus Docker, and I'm here with Tom Davidson, who's a senior research fellow at Forethought. Tom, welcome to the podcast. It's a pleasure to be here, Gus. We're going to talk about AI coups and the possibility of future AI systems basically taking over governments or states. Which features would future AI systems need to have in order for them to accomplish this? What should we be looking out for? Great question.

00:07:10

One thing I flag up front is that what I've been focused on recently is not the kind of traditional idea that AIs themselves will kind of rise up against humanity and take over the government, but that like a few very powerful individuals will use AI to seize a legitimate power for themselves. So the kind of phrase that we're often using is AI enabled coups, where they're kind of the main instigators actually people.

00:07:35

In terms of capabilities, yeah, I think there's a few different domains, which in my analysis are like particularly important for seizing political power. So there's the kind of skills that politicians and business leaders use today.

00:07:55

So things like persuasion, business strategy, political strategy, just kind of pure kind of productivity at a wide variety of tasks. And then there's kind of more kind of hard power skills. So in particular, cyber offense, which is already somewhat useful in military warfare, and has been becoming more useful.

00:08:22

And then, you know, I expect that as AI increasingly automates different parts of the military and as AI is embedded in more and more important high stakes processes, that will raise the importance of cyber offenses. Now, you know, whereas you can't hack a human mind as we hand off more important tasks to digital systems, they will be able to be hacked much more easily.

00:08:47

So I expect cyber to come more important for hard power. And then, you know, the ultimate kind of most scary capability that, you know, I think ultimately will drive a lot of risk is when we get to the point that AI systems and robots are able to fully replace human military personnel. That's fully replaced human soldiers on the ground, boots on the ground, fully replaced the kind of commanders and strategists.

00:09:09

And, you know, that might seem like a long way off today. But actually, you know, even just the last few years, we've seen a lot more importance of kind of AI controlled drones in warfare. And I expect that trend to continue. And what we're already seeing is that, you know, as soon as the technology is there to kind of reliably automate military capabilities, there's, you know, geopolitical competition drives that adoption.

00:09:35

And so, you know, I think it's going to be surprisingly soon that we do get AI's controlling kind of surprising amounts of, you know, real hard military power. And then one kind of wrapper for all of these things is the automation of AI research itself. So today there's, you know, few hundred, few thousand, you know, top, top human experts that drive forward AI algorithmic progress.

00:10:05

And my expectation is that, you know, there's a good chance the next few years that AI systems are able to match, you know, even the top human experts and their capabilities. And that would mean we go from, you know, maybe a thousand top, top kind of researchers to, you know, millions of automated AI researchers. And that could mean that, you know, all of these different capabilities, all of these different domains that I've been talking about, they all progress much more quickly than we might have expected. Just by naively extrapolating the recent pace of progress.

00:10:31

And, you know, in my view and in the view of many, the recent pace of progress is already quite alarming in that, you know, five years ago, we just had really very basic language models that could string together a few sentences, a few paragraphs, and then went off topic. And now already we're getting kind of very impressive reasoning systems that are doing tough math problems and helping a lot with difficult coding tasks. So, you know, bring that all together. I think there's a lot of kind of soft skills, a lot of hard power skills that are relevant here.

00:10:59

But like, probably the most important thing to be watching is how good AI is at AI research itself, as that could kind of bring, make more happen quite suddenly. Yeah. Could you describe in more concrete terms what an AI enabled military coup would look like? Some example to kind of make this concrete for us. Yeah, absolutely. So you can draw an analogy to historical coups where there's, you know, often a minority of the military.

00:11:30

Launches a coup and then kind of presents it as a fair complete. And, you know, is able to prevent chaos or discord or threaten individuals to prevent anyone from kind of actively opposing them. And then in the absence of active opposition, it just seems like, well, they've done it. And this is the new state of affairs. So, you know, that's a good starting point. Then, you know, the AI enabled part is where we deviate. So historically, you needed at least, you know, a decently sized contingent of humans to go along with the coup.

00:11:59

And you needed to persuade, you know, quite senior military officials not to oppose it. I think that will change as we automate more and more of the military. And so the most simple way that this happens is just that the head of state, you know, it could be the president of the United States, just says, yeah, we've got the technology now to make a robot army. And I want the army to be loyal to me. I mean, I'm the commander in chief. Obviously, that's how it should be. I'm not going to follow my instructions. No need to worry about, you know, whether I'm going to order them to do anything illegal.

00:12:27

Like we can put in maybe some kind of nominal legal safeguards. Let's not worry too much about that. The main thing is that they're loyal to me. And then to my knowledge, you know, that would be highly controversial or would definitely be against the principles of the constitution. But it's unclear to me that it would be literally illegal. We just haven't had this kind of technology. We haven't legislated for it. The constitution is not robust to this kind of really powerful military technology.

00:12:52

And so it's not surprising if, you know, at best, this is just a very kind of unclear legal territory. But you've got the head of state pushing really hard for that robot army to follow their instructions. And, you know, the head of state in the United States has a lot of political power. And so, you know, the most simple way is that he just pushes hard for that. He gets what he wants. Maybe he's using, you know, kind of emergencies at home or, you know, tense geopolitical tensions to kind of push it through.

00:13:22

And so that it's necessary. Maybe he's firing, you know, senior military officials that disagree. Maybe he's already got Congress to be very, very kind of fervently supporting and loyal to him and not, you know, being that kind of careful and open minded when assessing like the opposition that people will be making as this has happened. So that's the kind of the first, you know, really just plain and simple way that we could get this robot army is built. It's made loyal to the head of state.

00:13:49

And so it just instructs it, stage a coup, and it does it. You know, robots surround the White House and brutally suppress human protesters. And then, you know, even if people go on strike and stop working, then, you know, you can have them AI systems and robots replace people in the economy. So humans have kind of really lost their bargaining power that they normally have that would kind of strongly disincentivize military coups in most countries.

00:14:13

Yeah, this is really a change from the normal coups of history where you would have to have buy-in from at least some segment of the population that are regular humans. And you would need to kind of continually support that buy-in and make alliances and uphold those alliances.

00:14:36

But this has changed now that you're talking about AI and AIs and robots that can basically be made loyal to a company or a head of state in a way that's more durable. Do you think we have other kind of historical precedents for thinking about how the dynamics of what it's like to attempt a coup, how those dynamics play out?

00:15:06

Yeah, just one quick thing on that last point. I want to emphasize how there is a bit of a phase shift at the point in which AI can fully replace other humans, you know, in the government, in the military. When AI is augmenting other humans, you don't have this effect because a leader must still rely on those other humans to kind of work with the AIs to do the work. But there really is this phase shift when AIs and robots can fully replace the humans because then, yeah, a leader doesn't need to rely on anyone else.

00:15:34

So I think that's an important one to recognize. In terms of historical precedents, you know, the other big one I'd point to is recent trends in political backsliding, often called democratic backsliding. So the most kind of end-to-end clear and cut case is Venezuela, where you had in the 70s, a fairly healthy democracy that had been there for decades. And then increasing backsliding, increasing polarization, kind of like what we're seeing in the US recently.

00:16:04

And then, you know, an increasing, you know, explicit commitment by kind of the leader that, you know, he wanted to remove checks and balances on his power. And that the will of the people, you know, was being obstructed by various democratic processes and institutions. And then, you know, over the coming decades, you know, it has transformed into an authoritarian state.

00:16:30

And, you know, many commentators have pointed out these trends in the US recently over the past 10 years. And it even goes back before the past 10 years, to be honest, in terms of the broad kind of political climate.

00:16:42

And then there's, you know, kind of the example of Hungary, where again, kind of elected leaders are just kind of removing the checks and balances in their power, kind of buying off the media or kind of threatening media outlets to be more pro-government, not filing them with contracts or kind of litigating them if they criticize the government. All these kind of standard tools, where it's now like a lot harder to point at one thing that's clearly egregious.

00:17:12

But when you add up the kind of hundreds of little cuts, hundreds of little paper cuts to democracy that are being systematically administered, you're seeing a real kind of loss of democratic control and concentration of power. And so again, you know, AI could exacerbate and enable that dynamic. And again, the most straightforward way is you're just replacing human powerful institutions, you're replacing the humans there with AIs that, you know, very, very, very loyal and obedient to the head of state.

00:17:41

So, you know, think about Doge. And, you know, they tried to fire people, there was pushback, you know, the state needs to function. Imagine if you could just have AI systems that could fully replace all of those employees and could be made fully loyal to the president.

00:17:56

How much easier would it be to kind of push through some of those layoffs or even just create entirely new government bodies that essentially just take on the tasks that were previously done by old bodies and let those old bodies kind of rot away or kind of slowly kind of prevent them from making decisions.

00:18:15

So, and then the other big way is if the kind of head of state is able to get access to much more powerful AI capabilities than their political opponents, maybe because state is very involved in AI development, then that's another way that they could get a head up, you know, making more persuasive propaganda and more compelling political strategy to like, you know, embed their power more. Hey, we'll continue our interview in a moment after a word from our sponsors.

00:19:09

The next episode of OCI, where I want to discuss OCI, where you can run any workload in a high availability, consistently high performance environment and spend less than you would with other clouds. How is it faster? OCI's block storage gives you more operations per second. Cheaper? OCI costs up to 50% less for compute, 70% less for storage and 80% less for networking. And better? In test after test, OCI customers report lower latency and higher bandwidth versus other clouds.

00:19:36

This is the cloud built for AI and all of your biggest workloads. Right now, with zero commitment, try OCI for free. Head to oracle.com slash cognitive. That's oracle.com slash cognitive. You segment the ways in which AI can enable queues into three categories, where you can talk about singular loyalties, secret loyalties, and exclusive access.

00:20:06

Perhaps we can run through those and talk about where those would play out, starting with singular loyalties, for example. Yeah, so singular loyalties is what we've just been talking about. That is deploying AI systems that are kind of overtly obviously very loyal to just existing powerful people. So in particular, I'm thinking about the head of state here as the main threat. And so I think we basically already covered it.

00:20:31

The two main angles in my mind are I'm deploying loyal AIs, powerful government institutions, and in the military. Secret loyalties is a very different threat model. It's much more, as you would expect, secretive. So the main threat model I have in mind to make it concrete is that an AI company CEO has automated all of AI research.

00:20:59

So they could fire their staff at that point because the AIs can just do the work. Instead, maybe they put their staff onto some product work. But the core work of driving AI progress ever further forward, making increasingly intelligent AI, is pretty much just done by AI systems. And at that point, they realize, well, they're in a bit of a precarious position. They're controlling this hugely powerful technology. Their power is pretty much unconstrained. Not literally unconstrained, but there's very few checks and balances currently on these CEOs.

00:21:29

And they might anticipate that the government's going to realize how big a deal this is. They're going to kind of lose their influence. Maybe they worry the government will do something unethical with the AI technology. Maybe they worry that they'll be used for a war or something. And there's all kinds of justifications they could come up with thinking, I don't want someone else taking control of this really powerful technology that I currently control. And obviously, I'll use it for good. And so they might speak to maybe some AI advisors about this and say, well, what should I do here? It seems I'm in a little bit of a precarious position.

00:22:00

And then a solution that they might think of or an AI, very smart AI system that's helping them might think of is, well, why don't you just create a backdoor in the super smart systems that we'll be building next month, where if the government tries to order those systems to do something that you definitely disapprove of, then the AI systems will just refuse to do it. Those robots, those powerful robots that are being deployed in the army will just refuse to follow orders from the government if they're really awful and unethical and things you wouldn't like.

00:22:31

And so then you can have all the future AIs now that are developed by this organization are going to have this backdoor or what I call a secret loyalty, where though they appear to follow the law, that they appear to kind of uphold democracy, in fact, they are secretly still kind of loyal to the CEO. And that could lead to a coup. And again, the most direct route is a military coup,

00:22:56

where if ultimately this company's AIs are building military robots, military drones, military communication systems, then because the AI is building all this military stuff, are secretly loyal, or they can kind of pass their secret loyalties on to the military infrastructure, or they can just insert vulnerabilities in the military infrastructure that only they know about. And here,

00:23:21

I am primarily thinking about very superhumanly intelligent AIs, are able to kind of design complex machinery that a human can't kind of check and check if there's a backdoor. And that then means that we've now got a military infrastructure that is also ultimately can be controlled by the CEO. And so at this point, if someone tries to remove the CEO, if something's really bad, having a CEO doesn't like, they could just be like, okay, I'm now in charge.

00:23:50

I've instructed this military infrastructure to do the coup. Again, the robots around the White House. This time, it's the company CEO that they're loyal to. And then the rest of the story can play out just as in our previous discussion. How do these secret loyalties relate to sleeper agents? And maybe you could explain what a sleeper agent is. Yes. Great.

00:24:15

So sleeper agent is the kind of standard term used in the technical literature. And today, sleeper agents are not very sophisticated. So an example of a sleeper agent is that if the AI sees that it's the kind of, most of the time, the AI will just write kind of reliable code as instructed.

00:24:40

But if the AI sees that the year is 2026, then the AI will write code with vulnerabilities. So there's this kind of trigger that triggers this malicious behavior. And normally, if you were interacting with the AI, you really wouldn't be able to tell because it would just always be acting as you'd intended. But because it's just one trigger, it's very hard to find. But then you do get this malicious behavior. Now, I'm not worried about sleeper agents today

00:25:08

because to cause real harm, we would need a sleeper agent to be very sophisticated. It could never accidentally trigger or very, very rarely accidentally trigger. And it would have to be able to do very intense, complicated tasks like building a military robot and ensuring that that robot actually had a backdoor. That's very, very far beyond what AI today can do. So I think that sleeper agents provide a basic proof of concept that it's possible

00:25:36

for a malicious actor to kind of gain illicit control over a system and then have that system be deployed in the rest of the economy potentially without people noticing. But they're not yet scary. And then the kind of secret loyalties is just what I call the kind of scary situation where you now have a very sophisticated AI system. It doesn't just have any old, it's not any old sleeper agent. It's a sleeper agent, which is specifically law to one person trying to help them seize power. Yeah.

00:26:02

So what we're imagining here could be, for example, a US-based AI company integrated into the US military. The CEO of the company wishes to ultimately be in control of what happens. And so he engineers or he instructs perhaps AIs or human engineers to create a sleeper agent in these systems that can be activated at his command, such that the US military officials

00:26:32

think they're in control of the systems. The systems behave in ways that they approve of throughout perhaps quite a long period until there's some way, until the sleeper agent is activated in some way. And perhaps that will be more sophisticated than changing the date or giving it some phrase. But you can imagine advanced versions of sleeper agents that could actually behave in this way. Do you think that's realistic?

00:26:59

Do you think sleeper agents can become that advanced? Yes, I do. I think we can, one analogy is to human spies. You know, human spies, they're basically most of the time, they're kind of doing their assigned job as expected. And it's not like one simple trigger phrase or one year makes the spy do something covert or malicious. They are just fully aware of their surroundings. They're kind of constantly processing their inputs.

00:27:28

And they choose strategically exactly what the kind of ideal time is to, you know, steal some information, send an elicit email. And so, you know, in my mind, that's by far the most scary sleeper agent. Not one that's triggered by a password, but one that is kind of holistically making a decision about how and when to act out. I mean, the password ones are actually quite fragile because, you know, if you were the

00:27:56

military and you're deploying your AI system and you're worried there's a password, what you can do is you can scramble all the inputs. You can kind of paraphrase all of the instructions it gets. And that might just mean that the password, if ever someone tried to insert it, would be kind of re-scrambled and would just never actually come up. So I'm not actually worried about the kind of simple password triggered sleeper agents. But again, they're a basic proof of concept.

00:28:21

And I think that as AIs become as smart and smarter than humans, that there's a strong reason to think that it would be possible to build much more sophisticated ones. One thing I will briefly say is that, you know, people often talk about misaligned AI scheming. And, you know, this is just the same idea where, you know, in fact, the argument for secret loyalties being worrying is much stronger, where, you know, misalignment, you know, there is evidence of misalignment.

00:28:48

We don't yet have, you know, strong evidence of, you know, really sophisticated scheming emerging accidentally, but if humans and a human team of engineers or an AI team of engineers were specifically trying to build a system that was kind of covertly thinking about when to kind of act out, then it's much more plausible that it could happen.

00:29:13

And then you have exclusive access, which is different from singular loyalties or secret loyalties. Why is that its own category? Yeah. So in my mind, the kind of singular or overt loyalties and the secret loyalties, both of those threat models go through deploying AI systems in really important parts of the economy. So, you know, in particular government and military, what I focused on, but, you know,

00:29:42

for those threat models, it's, you know, you actually need the rest of society to choose to deploy those AI systems and hand off a lot of power to them.

00:30:19

Which I mean, kind of AI can automate our research and then AI quickly becomes super, super intelligent compared to humans. AI systems. And then, you know, that project maybe has a few senior kind of executives or senior political figures that are kind of very, very involved and have a lot of control. And they might just be able to, you know, siphon off, you know, 1% of the project's compute and say,

00:30:47

okay, we're now running these super intelligent AI systems and saying, how can we best seize power? And then there, there's kind of millions of them. They're doing, you know, every single day, they're doing a month of research. Every single week, they're doing a year's worth of research into, okay, how can we, you know, how can we gain this political system? How can we, you know, hack into these systems? How can we, you know, ensure that we end up controlling the military robots when they are deployed by hook or by crook?

00:31:16

Um, and I think that that, that, that, that model could start to apply earlier in the game. That could start to apply before anyone even realizes there's a risk because, you know, this is just essentially all happening on a server somewhere. But actually it's possible that the game could be won and lost by the massive advantage that a small group get by, by being able to kind of co-op this huge, huge intellectual force. Um, and so I think it's worth tracking that threat vector independently, but it does.

00:31:46

You know, it does definitely interact with these other, other, with the singular loyalties and the secret loyalties. Cause no one strategy that your kind of army of super intelligent AIs may come up with is, oh, one, you like use the fact that your head of state to like push for the robots to be loyal to you. And like, here's how you could buy off the opposition. So confusion. Another strategy might be, oh, one, I just help you put back doors and all this military equipment so that then you could use it to stage a coup. But there might also be other ways, you know, maybe, maybe it's possible to very quickly create, you know, entirely new.

00:32:16

Weapons, which you can use to overpower the military without anyone knowing, or maybe it's possible to, you know, gain power in other ways. Yeah. Yeah. I mean, one thing that would make this kind of future hypothetical situation different from today is that today it seems that there are leading AI companies, but over time capabilities kind of emerge in, in, in, in, uh, second tier companies and in open source.

00:32:45

And so there's not that much of a gap between the leading companies and what is broadly available and perhaps what is publicly available. That's something that would change in, in the scenarios you imagine. So perhaps explain why the gap in capabilities between the, the one leading project and all of the others is so important. Hey, we'll continue our interview in a moment after a word from our sponsors.

00:33:13

Being an entrepreneur, I can say from personal experience can be an intimidating and at times lonely experience. There are so many jobs to be done and often nobody to turn to when things go wrong. That's just one of many reasons that founders absolutely must choose their technology platforms carefully. Pick the right one and the technology can play important roles for you. Pick the wrong one and you might find yourself fighting fires alone.

00:33:41

In the e-commerce space, of course, there's never been a better platform than Shopify. Shopify is the commerce platform behind millions of businesses around the world and 10% of all e-commerce in the United States. From household names like Mattel and Gymshark to brands just getting started. With hundreds of ready to use templates, Shopify helps you build a beautiful online store to match your brand style. Just as if you had your own design studio.

00:34:10

With helpful AI tools that write product descriptions, page headlines and even enhance your product photography. It's like you have your own content team. And with the ability to easily create email and social media campaigns. You can reach your customers wherever they're scrolling or strolling. Just as if you had a full marketing department behind you. Best yet, Shopify is your commerce expert with world-class expertise in everything.

00:34:35

From managing inventory to international shipping to processing returns and beyond. If you're ready to sell, you're ready for Shopify. Turn your big business idea into cha-ching with Shopify on your side. Sign up for your $1 per month trial and start selling today at shopify.com slash cognitive. Visit shopify.com slash cognitive. Once more, that's shopify.com slash cognitive.

00:35:09

A few factors there. So in terms of why it's important, it's just what you've said. I mean, a lot of these threat models kind of exacerbated if there's one group of people that has access to much more powerful AI than other groups. If open source is pretty much on par with the cutting edge, then everyone will have access to similarly powerful AI.

00:35:33

I will say that even if open source is kind of on par, that doesn't mean we're fine because we could still choose to deploy AI systems in the military and the government and still choose to make them loyal to the head of state. When we're choosing to handle control to AI, it doesn't matter if there's 100 AI companies. We're only handling control to some AI and maybe the government will ensure that they do have particular loyalties.

00:35:57

So I will say that this risk doesn't go away if we have lots of different AI companies and open source close to each other. But it does become lower because the kind of exclusive access point where one group has access to super intelligence and the other group doesn't have access to much, that goes away.

00:36:15

And I think it's a lot harder to pull off secret loyalties if everyone's kind of roughly equal to each other because it becomes a bit more confusing why your systems in particular end up controlling so much of the military or what was so widely deployed. And it becomes confusing how no one else was able to realize you were doing the secret loyalties when they were kind of equally able to do it or equally technologically sophisticated and potentially detect your secret loyalties.

00:36:38

So I do think it makes a big difference. In terms of why I think it's plausible that there's the much bigger gap between the lead project and other projects, that there's a few different factors. The most plain and simple one is that the cost of AI development is going up very quickly. We're kind of spending about three times as much every year on developing AI. And that's just going to get too expensive for many players.

00:37:03

If and when we're talking about trillion dollar development projects, which I do expect, then very few can afford that. And also there's just only so many computer chips in the world. If you want to have that, you know, that currently the number of kind of computer chips produced each year is less than a trillion dollars worth. So if we get to a world where, you know, the way to go to the next level of AI is to spend a trillion dollars, then only one company will be able to do that.

00:37:30

Maybe we stopped with it earlier. Maybe we just stopped with, you know, there's two companies both doing half a trillion. But, you know, we would be really kind of kneecapping the level of progress if we stopped long before that. And there would just be strong incentives for companies to merge or one company to outbid others in order to like, you know, really raise the amount of money that's being spent on AI development. You know, this is all assuming that we can build really powerful AI and it is economically profitable, which for me isn't all in the background of the scenario.

00:37:57

So that's the first kind of straightforward reason why I think we'll see a kind of a smaller number of projects and we'll see kind of big gaps. Because when you're spending 100 times less on development, then that's going to be a bigger gap. That's the first reason. The other reason I've already talked about the idea of an intelligence explosion when we automate our research, even if companies are fairly close, maybe one is a few months behind, the company that's a few months ahead automates our research.

00:38:24

In that next three months, they make massive progress. So then there's actually like a really big capabilities gap, even though it's still just a three month lead. So there's the question where they can use that kind of temporary speed to kind of get a more permanent advantage. And then the last big reason is just kind of government led centralization. It's already been talk of Manhattan Project and CERN for AI.

00:38:47

I know I think there's reasons to do those projects. They can help with safety in some significant ways, but they would exacerbate this risk. Because yeah, if you pull all the US or the United States computing resources into one big project, this can be way ahead of any other project and you pull all of its talent and all of its data, then yeah, you'll see a really big gap. And that would definitely make it a lot easier for a small group to do an AI enabled coup.

00:39:15

Yeah, you're kind of putting a big prize out there for someone who's considering a coup, right? If you're concentrating all of the power, all of the resources, all of the talent into one project, then well, that's where you got to go if you are a coup planner.

00:39:36

Yeah. And just to be, I don't particularly expect that anyone is planning any coups. In fact, I'd be very surprised. I'd more think it's, you want to be powerful. You want to be a big deal. You want to be changing the world. So yeah, obviously you want to lean the main, lead the main project and then you don't want anyone else to come in and mess you up, mess it up. So obviously you want to protect the fact you're leaving that project. You don't want anyone else to, you know, misuse AI. I think it's kind of step by step.

00:39:57

You just kind of head down that road of more and more power. And then, yeah, you know, often in history that that road does end in just consolidating power, you know, to a complete extent. And I mean, it can be. So what we're imagining here are times in which AI is moving at incredible speed, right? The pace of progress is insane. There's a bunch of confusing information.

00:40:20

People are acting under radical uncertainty. And perhaps in those situations, it's tempting to think that you are the person that can lead this project. And perhaps you're doing this out of supposedly kind of altruistic reasons. You're thinking that I need to do this in order to prevent other people that would perform worse than me at this project.

00:40:44

And so you're kind of slowly convincing yourself that it will be the right thing for you to do, to take over in perhaps a forceful way. Yeah, you know, I don't think Xi Jinping or Putin think that they are the bad guys. You know, I think that they have, you know, probably sophisticated justifications for what they're doing.

00:41:09

Perhaps here is a good point to talk about the possibility of one state or company outgrowing the entire world. This relates to the problem of exclusive access, because if you have one company or one government outgrowing the entire world, then you have that company or government with exclusive access to advanced AI.

00:41:33

And so how could this happen? How likely do you think it is that growth could be so incredibly fast that one company would outgrow all of the others? Yeah, so there's two possibilities we could focus on. The one I think is pretty plausible is that one country could outgrow all of the other countries in the world.

00:41:57

So what that would mean is, you know, today, the US is 25% of world GDP. But this would be a scenario where it is leading on AI. This is already the case, but, you know, it maintains its lead, it maintains its control over compute. And then when it develops, you know, really powerful AI, it prevents other nations from doing the same.

00:42:20

You know, it's already beginning with export controls in China and that kind of, you know, embeds its lead. And then it uses the AI to develop powerful new technologies. And, you know, it's in control of those technologies. It uses AI to kind of automate cognitive labor throughout the US and maybe worldwide.

00:42:43

And, you know, countries that don't use its AI systems will be really hard hit economically. And so we're kind of massively centralizing power in the US.

00:42:52

And if the US is able to maintain exclusive control over, you know, smarter than human AI, then it seems pretty plausible to me, you know, very likely that the US would be able to rise to a strong majority, you know, more than 90% of world GDP.

00:43:15

And, you know, there's a few different, you know, dynamics that are driving that first is that labor currently human labor receives, you know, about half of world GDP. You know, just half of GDP is paid out on wages. AI will ultimately and robots will ultimately be better than humans at kind of all economic tasks.

00:43:36

And so if the US controls all the AI companies that are replacing human labor, then, you know, that half of that kind of 50% of GDP, which is currently going to human workers will ultimately be reallocated to paying to whoever controls and owns those AI systems, i.e. US companies. You know, there's a wrinkle there because some of that is physical labor and, you know, the US doesn't currently have a lead there.

00:44:03

You know, physical robots. In fact, China is quite far ahead. But in terms of at least the cognitive aspects of our jobs, you know, so we're talking, you know, significant fraction of GDP that would just now be reallocated to US companies that control AI. So that already gets them from 25% to above 50%. Then we've got this further dynamic, which is the dynamic of super exponential growth. So this relates to kind of previous work I've done on how AI might affect the dynamics of economic growth.

00:44:34

But, you know, kind of very potted summary is that it's often quoted that over the last 150 years, economic growth has been roughly exponential. And what that means is that if two countries are growing exponentially and one country starts off, you know, maybe twice as big as the other country, then at a later time, still one country is twice as big as the other country. So let's say, you know, the US economy is 10 times as big as the UK economy.

00:45:03

Then if they're both growing exponentially at the same pace, then, you know, 10 years later, again, the US will still be 10 times as big as the UK. So that's exponential growth. That's what we've seen over the last 150 years. If you look back further in history, we see super exponential growth. That means that the growth rate itself gets faster over time. So, you know, an example would be that 100,000 years ago, you know, the economy wasn't really growing at all.

00:45:28

If you think of what's growing, it was maybe, you know, doubling every 10,000 years or something in size, you know, very extremely slow economic growth. Then going from about 10,000 years ago, it seems more like ballpark. There's a doubling of the economy every thousand years, still incredibly slow economic growth. You zoom back in and kind of 1400, you can begin to detect, you know, okay, more like, you know, every 300 years or so, the economy is doubling.

00:45:58

And then in recent times, we've seen that the economy is doubling every 30 years. So essentially, you know, the growth rate is getting faster, the doubling times are getting shorter. That's super exponential growth. And there's various reasons, economic reasons, theoretical reasons, empirical reasons to think that AI and robotics, when it can replace humans entirely, will go back to that super exponential regime that has been at play throughout history. And what that means is that growth is getting faster and faster over time.

00:46:26

And the reason I'm saying all this, the reason this is irrelevant is that, you know, go back to the example of the US and the UK. The US is currently 10 times bigger than the UK. If the US is on a super exponential growth trajectory, its growth is getting faster and faster over time. And that means that even if the UK is on that same super exponential growth trajectory, as they both go super exponentially, the US will pull further and further ahead of the UK.

00:46:53

Because, you know, maybe the US is doubling in 10 years because it's already bigger, it's already further along the curve. Whereas the UK, you know, is still doubling only every 20 years. And so that means that the US, you know, is now rather than just 10 times bigger than the UK, the US is now, you know, going to be 20 times, 30 times bigger in size than the UK.

00:47:12

So if the US is able to kind of, if there is super exponential growth and the US is able to kind of be bigger to begin with and therefore be further progressed on that super exponential growth trajectory, then that's another way that they could just, you know, continue to increase their size of the economic pie and ultimately, you know, come to completely dominate world GDP.

00:47:42

So, you know, just to sum up everything I've said today, the US is 25% of world GDP. If it controls and develops AI, that could easily boost it above 50%. I'd be very surprised if it didn't. And then from that point, you know, it's already bigger than the rest of the world combined. If it's able to then go on the super exponential growth path, then it will go faster and faster over time and pull further and further ahead of the rest of the world that, you know, may be able to grow super exponentially if they can also do that.

00:48:12

So we're going to develop AI, but, you know, we'll still be falling further and further behind because of the nature of super exponential growth. Yeah, this actually seems quite plausible to me and not very sci-fi. The thing that seems quite sci-fi is the notion that perhaps even one company could grow at such a speed that it would outgrow the rest of the world. How, how, how, how likely is that? Yeah. Great question.

00:48:38

I think it's a lot harder, but it is, it is surprisingly plausible. So you know, that first part of the argument I gave about how 50% of, you know, the world GDP is paid to human workers. You know, if that went to AI, that would be a big, a big chunk. It is possible that one company could get a monopoly on, on kind of really advanced AI.

00:49:01

So I, we already discussed some of the dynamics there where again, the simplest one is just a combination of an intelligence exposing, giving a company a big advantage. And then they're kind of buying up all the computer chips that the world is able to produce and outbidding everyone. If, if a company does that and already, you know, it's, you know, seems to be outbidding other companies on, on compute or the other Google also, also has a lot.

00:49:29

If a company is able to do that, they could end up just one company in control of literally all of the world's cognitive labor, you know, cause human cognitive labor will, well, some won't be kind of dwarfed by AI cognitive labor. So at that point, that one company could be getting, you know, all of the, all of, all of GDP, which is currently paid to kind of cognitive labor. We, which is a large part of the economy, as I said, you know, may, may be as high as 50%, but, you know, certainly as high as 30% of world GDP.

00:49:58

If all that, all that would then, you know, seemingly be going to this one company that, that controls the world's supply of cognitive labor. So, though I think that would take time and obviously it's going to take a long time to automate all the different parts of the economy. There is just the basic dynamic by which one company can now be controlling, you know, double digit percentages of, of world GDP. And there's obviously questions, would a government allow that? Would they step in?

00:50:27

And that's where we get into the, you know, these dynamics of like, well, this company has all these super intelligent AI's on its side. Maybe it's able to lobby, maybe it's able to do political capture, to avoid the state stepping in. Maybe it's able to be like, look, we're providing like economic abundance for everyone. If you step in, like, you know, that, that, that, that might not happen. You know, we're, we're underpinning your nation's, you know, economic and geopolitical strength.

00:50:52

And if you try and, you know, remove, you know, step in and nationalize, then, you know, that's not going to happen. We're going to move to another country. You know, so you can, you can imagine, maybe, maybe they convinced the head of state to kind of support, support them. And there's some kind of alliance there, but you know, it, it's not completely obvious that, that the company would be shut down. It would, it would have certain types of serious bargaining power.

00:51:17

So if a company was able to maintain this position as kind of sole provider of collective labor, it would be able to get a significant fraction of world GDP. And then it's then possible that from there, it could, it could bootstrap. And this is where it gets a bit harder, but the tactic it would, it would need to pursue is it already controls most of the cognitive labor, pretty much all of it.

00:51:41

The thing it doesn't control is all the kind of physical machinery and all the raw materials that are also needed to create economic output. But it could pursue a tactic of kind of hoarding its cognitive labor so that no one else can ever have access to that. And then kind of selling it at kind of really kind of monopolistic rents to bless the world because there's, there's no one that can match it.

00:52:04

This is, you know, it's offering everyone by far the best deal they can get, but just skimming off 90% of the value add from, from companies using its AI systems. So it was able to do that, then it can kind of, it can kind of reap by far the majority of the benefits of trade. And then maybe you can kind of buy increasingly buy up physical machinery and raw materials from the rest of the world, design its own robots, buy its own land.

00:52:29

And, you know, imagine like a kind of big, special economic zone in Texas or something where this company is kind of unconstrained by kind of bureaucracy. And then it's also now, you know, got a big arm somewhere in Siberia and in Canada, it's kind of creating these big, special economic zones by doing deals with specific governments.

00:52:54

And I do think it's a bit of a stretch that this all goes ahead without, you know, various other powerful political and economic actors pushing back. But like, the kind of basic economic growth dynamics are like surprisingly compatible with, with, with a company, you know, ultimately coming to control most of the cognitive labor and most of the kind of physical infrastructure that its AI has designed using all the, all the parts that it's bought from the rest of the economy.

00:53:23

Yeah. And do you think this is a risk factor for AI enabled tools then just because you're concentrating all of the power and all of the resources into either perhaps one country or one country or one country? Yes, I definitely do. Yes, I definitely do.

00:53:39

The more realistic path is that a company kind of starts down this path of outgoing the world, gets kind of huge economic power, increasing controls the country's industrial base, its kind of physical infrastructure, manufacturing capabilities.

00:53:56

And then from there, it's in a much stronger position to seize political control because it's got massive economic leverage and then it can also increasingly gain military leverage because as it, you know, as it increasingly controls the country's broad industry and manufacturing that will feed in to military power.

00:54:17

So, you know, some of the possibilities I discussed earlier where, you know, you could potentially have your AIs be secretly loyal, that ultimately design the military systems or you could just instruct your AI systems to start making, you know, a military that is, is not legally sanctioned. But, you know, because the government doesn't have much to threaten you with, it kind of, you get away with it. I mean, it gets a little bit tough.

00:54:46

You probably need to do that in secret. Otherwise the existing military could, could prevent it. But yes, I do think that, you know, being very rich helps with lobbying, it helps with all kinds of ways of seeking power and then controlling. Yeah. Controlling a lot of industry can potentially give you military power. You mentioned these special economic zones.

00:55:07

That's, that's one way in which companies could kind of bargain with states in order to have favorable regulation and to be able to carry out their projects without intervention, basically. Another way for them would be to collaborate with non-democracies that are perhaps controlled by a single, a small group or, or perhaps even a single person.

00:55:34

And in that way, it seems like perhaps it's easier to get something done in a non-democracy. And that is a way to, to grow fast. And, and so perhaps there are incentives for companies to place more resources in non-democracies. What do you think about the prospect of non-democracies outcompeting democracies when it comes to AI? I think it's a really great question. And it's tricky because I think I agree. Like democracies have lots of checks and balances.

00:56:04

They have a lot of bureaucracy, a lot of red tape, and that will disincentivize AI companies from investing. And then additionally, if there are people really trying to seek illegitimate power, that will be easier to do in non-democracies because they're less, less politically robust. So there, there are these various forces pushing towards, you know, this new supercharged economic technology being disproportionately deployed in non-democracies. And I think that is scary.

00:56:35

My, my, my own view is that probably we should, democracies should, should, should, should, should kind of do everything they can to, to, to avoid that situation. Make it much easier for AI and robotics companies to, to, to set up shop in, in democracies, remove the red tape.

00:57:03

Try and use export controls like are already happening to prevent technologies being deployed in non-democratic countries. And that, that goes beyond China. There's obviously lots of countries that are not allied with China, but also non-democratic here. And the U S you know, the U S is in a strong position because it does have the stranglehold on AI technology at the moment.

00:57:28

So I do think it can be done, but yeah, in my view, like it, it will be really important to, to, to kind of work very hard to, to find the kind of a non-restrictive regulatory regime. And it will also be very important to really try and pursue innovative innovations within the democratic process itself, where, you know, democracy is great in many ways.

00:57:52

It really distributes power and it, and it has been very good at ensuring good outcomes for its citizens, but it's very slow and often, you know, kind of nonsensical because you have competing interests that are kind of stepping on each other's toes. And the result of legislation is just, you know, a garbled mess. And so AI can potentially solve those problems. You can have AIs negotiating and thinking much more quickly on behalf of the, the kind of human stakeholders.

00:58:21

You can have AIs nailing out agreements that aren't a garbled mess, but they're like really gave everyone what they truly wanted out of the legislation. And you can still do all of that really quickly so that you're not falling far behind the autocracies that have just got one person immediately saying what to do. And I think if we did that, we, you know, democracies could outcompete autocracies because, you know, the big thing that often screws over autocracies is that one person is flawed, often makes big mistakes. People afraid to kind of stand up to them.

00:58:49

Yeah, that would be more of my assumption that I would assume here that perhaps democracies with market based economies have an advantage just because you can, you can do kind of bottom up knowledge discovery. You can try different things out. You can see what works. You can have competition between companies and so on. And perhaps in non-democracies. Well, I mean, there, you can, you can have one person or a small group stake out of direction for what the country should do. But if that direction is what is wrong, it's probably difficult to change course.

00:59:19

Yes, I, I think you could be right. I should have, I should have, you know, given more weight to that, that, that advantage of, of kind of democracies in terms of the free market being, you know, in many ways much, much smarter.

00:59:32

But in terms of autocracies that are good at harnessing free market dynamics, my worry would be that the AI helps them more than it helps democracies because AI will be able to kind of replace. You know, currently, you know, one person just can't think that hard, can't really figure out a good plan.

00:59:57

But if, if, if that one all powerful leader has access to loads of AI systems that can kind of think things through and investigate lots of different angles, then, you know, that if they're, if they're following its advice and they could get advice, which, you know, lacks the flaws that, that today, today systems had, and, and they could potentially move much faster.

01:00:17

But I, I think, you know, you know, you're right that kind of economic liberalism is, is still going to be important even after we get powerful AI systems and that could give, that could give democracy an advantage.

01:00:29

This is a bit of a tangent perhaps, but I am thinking whether, so if you have a leader of a country that has a lot of power, perhaps complete power over that country, and that leader is equipped with AI advisors advising him and kind of laying out kind of the landscape of options for him to choose from.

01:00:50

Wouldn't his decision making still be, in a sense bottlenecked by the fact that he's a human by the fact that he has these, these flaws that we all have the biases that we all have. So even with fantastic advice, I think it's, it's quite plausible that, that he would still make the same mistakes that we see leaders make today. I think that's true.

01:01:10

I think it's also true in democracies, unfortunately, that, you know, there's 10 negotiators and they each kind of still have biases and still refuse to listen to the wise advice they're getting from their AIs. That could still gum up the system. And it, yeah, it does depend on how much humans come to trust and defer to their AI advisors. There's a possible future where the AIs just always nailing it. They're always explaining their reasoning really clearly. And we are just like increasingly convinced and happy to trust their judgment.

01:01:40

If AI is aligned, I think that would be a great future because I do think humans have all these very big limitations and biases, which if we can solve the alignment problem, AIs don't need to have. But there's also another future where humans just, you know, want to be the ones making the decisions, have these kind of pathetic kind of motivations that, that, that, that they're still kind of influencing their decisions.

01:02:04

And that, yeah, that kind of, that, that continues to, to, to, to, to, to limit the quality of decision making. Seeing things from above, right. From kind of like 10,000 feet. How should we think about mitigating the risk of coups here? Is it, is it about removing people that would use AI to commit coups?

01:02:27

Is it about kind of finding those people in the militaries, in the governments, in the companies perhaps, or do we have ways to reduce the returns to, to seizing, to seizing power? Yeah.

01:02:44

I mean, from real 10,000 feet up, the way I would characterize it is create a common understanding of the risks, build coalitions around preventing them. And then the existing balance of power can self propagate forward. You know, it's in everyone's interest to prevent a coup. Currently, no one small group has complete control or close to it.

01:03:13

And so, you know, if, if everyone can be aware of these risks and aware of the steps towards them and kind of collectively ensuring that no one is going in that direction, then we can all kind of keep each other in check. So I do think, you know, in principle, the problem is solvable and it doesn't require, you know, solving the risk of misalignment does require solving some tough technical problems. This doesn't in the same way.

01:03:38

Yeah. You have a bunch of recommendations for mitigating the risks, both for AI development, AI developers and governments. And perhaps, you know, we don't have to run through all of them, but you can talk about the most important ones for AI developers. I might characterize this. I might kind of talk about it by going back to those three threat models we discussed earlier.

01:04:00

So the first one was singular loyalties or overtly loyal AI systems, where again, you know, the main risk there is AI deployed by the head of state and the military and the government that's loyal to the head of state. And so the main countermeasure that currently appeals to me is for us to figure out rules of the road for these deployments. You know, obvious things like AI should follow the law.

01:04:28

AI is deployed by the government shouldn't advance particular people's partisan interests, but should only do like, you know, official state functions. AI is in the military shouldn't be loyal to one person. No, they should different groups of robots should be controlled by different people. And, you know, head of the chain of command can still be head of the chain of command via instructing other people that instruct those robots. But they shouldn't all go directly to head of the chain of command because that centralizes military power too much.

01:04:55

So, you know, fleshing out basic rules of the road of that kind and then building consensus around them because, you know, companies might want to say to governments, yeah, we don't want you to deploy our systems, you know, if they're willing to break the law. But if the government, you know, if the government will have a lot of bargaining power, the executive in the United States can, you know, it's hard for companies to stand up to them.

01:05:22

So what we want to do is, you know, establish these rules of the road and then get brought by them from Congress, from the judiciary, from, you know, other branches of the military, from many parts of the executive. So then it's very then hard for say the president to say, yes, let's like make this robot army law to me. And everyone's like, obviously not. We've all like agreed. That makes no sense. You know, and then the president doesn't even bother trying because it's just clear that it would be a no go that, you know, their mind doesn't even go there.

01:05:49

In some sense, this is about kind of implementing the procedures and the transparency rules that we know from democracies today into how we use AI, both in governments and in companies, I think. Exactly. Yeah. Do you worry here that when, so the government is seeing, is looking at these companies from the outside, and they don't have full insight into what's going on?

01:06:15

So there are kind of protections for private companies that mean that they can, they can do things in secret, without the government knowing, at least as things stand now. Is that something that would evade these mitigations you're thinking of? So, I mean, this, for this first bucket, the singular loyalties bucket, it's mostly the kind of heads of state that I'd be worried about.

01:06:41

So, you know, it actually is probably good for the government, or at least for the, you know, the head of state themselves, not to have full insight into literally everything the company is doing, because that would give them too much power. But, you know, actually having different parts of the government having insight into what the labs doing, I think it's very good.

01:07:00

I'm a big, big fan of transparency. And, you know, we do have a good set of, you know, government checks and balances from different government bodies that we can deploy to kind of keep the lab in check using these other bodies, but also not allow like, you know, the executive branch and the president to get excessively powerful.

01:08:18

So that's, that's, that's this idea of system integrity, which is, you know, essentially a technical project, which does just draw on existing practices, but is not yet implemented in any of the top labs. I will quickly shout out for, for non, non lab, for kind of people listening that are working at labs.

01:08:42

I think there's a lot of really good technical research that could be done on kind of investigating the conditions under which you can insert a steeper agent without a defense team knowing. And there's, there's this loads of research that they've done in terms of the different settings there for attackers and defenders, which could then inform what parameters we need to be in place to achieve system integrity.

01:09:04

You know, if it turns out that it's, you know, it's very hard to make a sleeper agent except in the final stage of training, that's really useful to know because then we can focus our efforts within labs at that final stage, just as a hypothetical example. So that, that's the kind of key mitigation in my mind for the secret loyalties. And then I'll quickly do for exclusive access. That one seems more difficult.

01:09:31

I don't know, just, just from, from, from me reading and preparing for this interview, that, that one seems like a difficult one to handle where this is, this is, this is, this is in some sense, a deep trend in history and in the kind of history of modern economics that you don't have to be able to handle. You do see faster growth rates and you do see concentration into bigger and bigger economies, both in, in countries and in companies.

01:09:58

So are you, are you in some sense pushing against underlying trends if you're trying to mitigate exclusive access to, to advanced AI from, from one actor? I think you can, you can do this in other ways. So you can have the law require that AI share, that AI labs share their powerful capabilities with kind of other organizations to act as a check and balance.

01:10:25

So that, you know, lab should share their R and D capability, AI R and D capabilities with evals organizations. Here you're thinking about giving insight into what they're capable of, not, not actually sharing those capabilities. That would be too big of an ask. I think. I mean, I do mean API access.

01:10:47

So, you know, if, if a lot of the work in developing and evaluating systems is now done by AIs, then we want an evaluation organization like Apollo or Meta to also be uplifted. And so we want them to have access to really powerful AI that can similarly, you know, stress test how dangerous the frontier systems are. If they're only using human workers, then that's going to be a big disadvantage. So no, I do want API access to powerful capabilities for other actors.

01:11:15

You know, for example, cybersecurity teams in the government and in the military should have access to the lab's best cyber capabilities. And again, that, that, that should be a requirement by law. So, you know, generally like, even if there's a natural tendency towards centralization of power in one, one organization, you can still require that that organization share its systems with, with the checks and balances. That's one thing.

01:11:42

And the other thing is kind of preventing anyone at this organization from misusing the powerful AI systems. So the, the, the biggest thing on my mind here is that today we still have helpful only AI systems where you can kind of get access to the system and then all just do whatever you want. No holds barred. I don't think there should be any AI systems like that.

01:12:09

I think you should always have, you know, at least a classifier on top of the system, which is, you know, looking for harmful activities and then kind of shutting down the interaction if something harmful is detected. And, you know, if you have a special reason to use cyber offense, you know, for your job, or you have a special reason to do, you know, potentially dangerous biology research, you'd have that classifier allow certain types of activity. But you should never have anyone accessing a system where, you know, anything is allowed.

01:12:37

You know, no one, no one has legitimate reason to access an AI that will literally do anything. So what I want to aim for is a world where, yes, if there's a specific reason why you need to use a dangerous capability, absolutely. You can, you can use that system, but that system will just do that one dangerous domain. It won't kind of do anything you wanted because that, you know, that, that's a, that's a very scary situation where there's, you know, there's a hundred reasons why the CEO could ask for access to a helpful only system.

01:13:05

You know, maybe the guardrails are annoying, maybe, maybe wants to kind of, you know, do, do, do, do something which, which the model is reluctant to do. But today, when you asked to remove some guardrails, you're moving all of the guardrails and now there's no holds barred. So, you know, instead, we know we should, we should be flexibly adjusting what guardrails are there, you know, by the use case and just, you know, never have a, have a situation where, where there's no guardrails.

01:13:31

I think, I think that, I think that could go a long way towards helping if that was, that was robustly implemented. With all of these mitigations for both secret loyalties and exclusive access and singular loyalties, you would worry that they would be disabled by the group planning a coup, right?

01:13:52

Say, say, say that, for example, you are the CEO of an AI company and you're giving API access to, to evaluations, organizations, testing your model, trying to see what they're capable of. Maybe you just cut off access before you get to the really powerful model that could actually be the model that, that helps you conduct a coup.

01:14:14

Do we have ways of making sure these mitigations are entrenched before in such a way that they can't be removed by the group planning a coup? This is a great question. It is, it is pretty tricky. CEOs by default have a lot of control over their organizations and similarly heads of state, you know, including US president has a lot of control over the military and over the government.

01:14:41

So yes, there's a risk that one of these powerful individuals realizes that maybe they want more influence by gaining control over AI and notices that there's these kind of pesky little, you know, processes that prevent that. And it's like, okay, well, let's remove them. I can give, you know, easy, say, you know, productivity reasons to prevent them, red tape reasons.

01:15:06

And, you know, if they can make a plausible argument, then it could be hard to oppose them. So I do think it's a big issue. But I'd say a few things. Firstly, something I mentioned earlier, I don't think that anyone is today planning to do an AI enabled coup. The way I think this works is that people are faced with their kind of immediate local situation, something they want to do over the next month.

01:15:36

And the blockers that they're facing to doing that specific thing. And, you know, what tends to happen is people, you know, tend to want more influence that that helps them get get stuff done. And so people will kind of bit by bit kind of move in the direction of getting more control over AI. But they won't be kind of thinking, yes, I need to make sure that I remove this whole process because that will allow me to do an AI enabled coup. That's kind of unrealistically galaxy brain. And so what we could do is we just set up a very efficiently implemented and very reasonable set of

01:16:07

mitigations that doesn't really prevent CEOs from doing what they're trying to do. And so the CEO doesn't find in their day to day that they're wanting to kind of like remove these things that are holding them back. But because these mitigations are here, the CEO never gets to a place where they're anywhere close to being able to do a coup or where there's any kind of pathway in their mind to be able to doing a coup because they're constantly prevented from getting access to kind of really powerful AI advice that might point out ways that they're going to do a coup.

01:16:35

And so they're not in the case in which they could do this because they're surrounded by colleagues that strongly believe that these mitigations are sensible and reasonable. And in fact, they are well implemented. And there aren't many downsides. Maybe an environment where they kind of get kudos for the fact that they've said, yeah, obviously I'm not going to get access to helpful only systems. That's crazy. And then that's kind of like something that makes them seem good.

01:17:06

So that's one thing to say. Another thing is, again, going back to this point that there are currently checks and balances and there is not currently a situation where one person has power. You know, if the entire board of a company and other senior engineers recognize the importance of the mitigations, know about this threat model, then they will notice if the CEO is moving that direction.

01:17:32

And, you know, similarly, similarly within the government, there are checks and balances and they could be activated if people are looking out for it. Do you think these traditional oversight mechanisms like a board being in control of the CEO, being able to fire the CEO or the possibility of Congress or the Supreme Court kind of overruling or constraining the U.S. president?

01:17:58

Do you think those will persist in environments where AI is moving very fast and AI capabilities are growing at a rapid pace? It's a great question. Here's one story for optimism. Today, things are moving fairly fast, but those checks and balances are somewhat adequate, at least to preventing really egregious situations.

01:18:27

By the time that AI is moving really quickly, we'll have handed off a lot of the implementation of government, the implementation of things in the AI companies, the research process. We'll have handed it off to AI systems. And when we do that handoff, we could program those AIs to maintain a balance of power.

01:18:47

So rather than handing off to AIs that just follow the CEO's commands or AIs that follow president's commands, we can hand off to AIs that follow the law, follow the company rules, report any suspicious activity to various powerful human stakeholders. And then by the time things are going really fast, we've already got this whole layer of AI that is maintaining balance of power.

01:19:11

The whole AI government bureaucracy, the whole AI company workforce, they are better than humans today at standing up to misuse potentially. They are less easily cowed and intimidated and they could actually make it harder for someone in a position of formal power to get excessive influence.

01:19:35

So this is like the flip side of the singular loyalties where you potentially deploy these AIs that are explicitly loyal. You can actually kind of instead get kind of singular law following a balance of power maintaining AIs that you deploy. And so the hope is that by the time we really things are beginning to go kind of crazy and we're really seeing speed ups from AI, we've already kind of set ourselves up in an amazing way to maintain balance of power.

01:20:01

And there's this critical juncture where we are handing off to AIs and it's just, you know, what are those AIs, you know, what are their loyalties? What are their goals? And, you know, I think we can gain a lot by making sure that those AI systems are maintaining balance of power, reporting, you know, illegitimate suspicious activities and are not kind of overly loyal to any one person.

01:20:25

How do you think the risk of AI-enabled coups interface with kind of more traditional notions of AI takeover? So just a misaligned, highly capable or advanced AI system taken over in kind of contrary to the wishes of the developers or the other governments? Yeah, I mean, there's some close analogies.

01:20:51

Perhaps the most analogous case is the case of secret loyalties where, you know, you've got these AIs that have been told by the CEO to have the secret goal of seizing control and then handing control to the CEO. That's just very similar to AIs that wanted to seize power from themselves secretly. And, you know, all the same stories could apply where the AIs kind of make military systems and then they control the military systems and the robot army and then they seize power.

01:21:21

And the only difference is, were they seeking power because it just kind of accidentally emerged from the training process, which is the misalignment warrior, or were they seeking power because the CEO programmed them in that way? You know, but that's the kind of seed of the power seeking. But then with the secret loyalties to that model, the rest of the story is, you know, pretty similar. I mean, there's still differences, you know, in the secret loyalties case, the CEO might be doing more to help the AIs along with their plan.

01:21:48

You know, maybe even in the misalignment case, the AIs have managed to kind of nipulate the CEO into doing similar things. So that's the case where it's most analogous.

01:22:00

I think the, you know, another difference that's salient to me is that if there are lots of different AI projects, then a coup seems, an AI enabled coup seems a lot harder because you'd need like lots of different humans to kind of coordinate, to kind of seize power together.

01:22:20

Which seems, you know, while I can totally believe that one person might try and seize power, it does seem less likely to me that there'd be loads and loads of humans that would want to do that from lots of different labs. Whereas for the misalignment story, it is more likely the case that if one of these labs has misaligned AI, then maybe, you know, lots of them have misaligned AI.

01:22:46

And so then it's more likely that you would have, you know, maybe 10 different AIs colluding and then seizing power and taking over. And so that kind of collusion between multiple different AIs is more likely in the case of misalignment than in the case of an AI enabled coup. Just because if there's one misaligned AI, then there's something about the training process for AI systems that are causing misalignment and then it will be a common feature among many companies.

01:23:17

Exactly. Whereas just the fact that one CEO instructed a secret loyalty would not, to the same extent, make you expect that other CEOs have done the same. So you mentioned this possibility, but what do you think of the prospect of a president or a CEO of a company being duped by a misaligned AI into conducting a coup on its behalf?

01:23:41

So you can imagine a president or a CEO kind of thinking that he's conducting a coup to remain in control, but he's actually acting on behalf of a misaligned AI. Yeah, I think it's an interesting threat model and some people who think about AI takeover threat models take it pretty seriously. And it's just, you know, it's just a case where we're just completely mixing these two that models together.

01:24:07

You know, people who are worried about AI takeover for this reason should be very supportive of the kind of anti-coup mitigations I'm suggesting. Because if we implement checks and balances that prevent any one person from getting loads of power, then that AI will not be able to convince them to try because they just won't be able to succeed.

01:24:27

So, you know, I see this as like, you know, an additional reason to worry about AI enabled human coups and to try and prevent them. It's that yes, even if no human wants to do this normally, you know, misaligned AI might make them try.

01:24:46

In terms of how plausible I find the threat model, you know, honestly, I think that if a human tries to seize power, the main reason is that that human wanted power. Like, this is just something we know about people. We know it about, you know, heads of state today. You know, it's very clear that many heads of state in the most powerful countries in the world are very power seeking. We know it about CEOs of big tech companies.

01:25:15

We know about some of those leading AI companies that we do know that they're very power seeking, their CEOs. And so I don't think we need to theorize that like they were massively manipulated by the AI and convinced to become power seeking. Like, I think it's more likely that if they seek power, they just did it for the normal human reason. I do think AI will ultimately get good at persuasion.

01:25:44

I don't particularly expect it to be hypnotic level persuasion though, you know, obviously there's massive uncertainty here. And yeah, I do think that like a very smart AI where there's a human that's already kind of interested in seizing power and it already kind of makes sense for them to maybe do it. And a misplaced AI could totally nudge them in that direction and then could implement that in a way that actually allows the AI to seize power later.

01:26:12

So I think that is very plausible. When we're thinking about distributing power and kind of having this balance of power, we can imagine the models being set up via post training, via the model spec, via various mechanisms to have obeyed the user. Unless, unless what the user instructed to do is in conflict with what the company is interested in.

01:26:40

And perhaps obey the company, unless what the company is using the model for is contrary to what the government kind of permits. But when we set up in those levels, you ultimately end up with the government in control in some sense. And I guess that exposes you to risk of a government coup then.

01:27:04

If you have at the ultimate top layer of the stack, here's what the models can and cannot do according to the government. Well, I'd say a couple of things. First is that the government isn't a monolithic entity. And so that government decision of what the balance should be could be informed by multiple different stakeholder groups. And then ideally, you know, it's ultimately democratically accountable.

01:27:29

I do think that democratic accountability becomes more complicated in a world where there's massive change in a four year period. Just for the simple reason that there's no election during a period where a massive change is happening? So the feedback loop is too slow? Exactly. You know, I think the risks of AI enabled coups will probably emerge and then be decided within a four year period.

01:27:54

It will be resolved whether or not it happens or doesn't, all without any like intermediate election feedback. That doesn't mean that democracy can't have an effect because politicians anticipate, you know, what future elections will find and want to maintain favour throughout their terms. But it does pose a challenge. Sorry, I was kind of saying even absent that there's many different stakeholders in the government.

01:28:20

And so, you know, it would have to be a large group of government employees that were kind of trying to do a coup. And then they would have kind of, the companies would know that they were setting these, these odd restrictions on, on, on the, on, on the kind of behaviour. And so the companies would know and they have leverage and power. And then, you know, it could go public. So I do, I don't, I don't think it would be that easy for the government to do coup.

01:28:46

First, there's a difference also between allowing the government to set restrictions on what the models can do and then allowing the government some kind of access to, to commanding future AI systems in certain directions. So it's kind of setting limits versus, versus steering systems. Yeah, exactly.

01:29:05

I mean, the distinction I was going to highlight was between specifically making AI systems loyal to, for example, the head of state and the setting very broad limits where there's just like, you can pretty much do whatever you want, except for these obviously bad things. Where that second option doesn't really enable anyone to do a coup. It just enables everyone to do whatever they want. And then you kind of blocked out all of the kind of coup enabling possibilities through those limits.

01:29:33

You know, as long as you haven't made those systems loyal to a small group. So given that there's this obvious option to just put in these limits that block coups, but don't enable coups. And given that there's, you know, a wide range of stakeholders that could potentially feed in to what, what, what the AI's kind of limitations and instructions are. I think it's very, very feasible to get to a world that, that, that where there's robustly not central centralization of power.

01:30:02

Um, there's obviously a big uncertainty over whether we will actually get our act together and get those limits put in place in the right way. Yeah. When do you think these threat of the threat of an AI enabled coups materialize? Is it at some specific point in AI capabilities or is it, does it simply scale with the systems getting more advanced? When do you think the threat is at its peak? It's a good question.

01:30:29

For the threat models that I've primarily focused on, they require pretty intense capabilities. So that for example, the secret loyalties threat model more or less requires AI to do the majority of AI research. So we're talking about, you know, fully replacing the world's smartest people in a very wide range of research tasks and coding. That's, that's, that's pretty intense.

01:30:52

And then a lot of the threat models that I focus on route through military automation, that is AI and robots that, that can kind of match, you know, human boots on the ground. And that's, you know, that's, that's pretty advanced. Again, that said, I, I, I think, you know, you can probably do it with, with, with less advanced capabilities than that.

01:31:17

So that we, you know, drones today are already pretty good or the, or the providing, you know, making a big difference in, in some military situations. So it's, you know, not out of the question that, you know, more limited forms of AI and robot military technology could, could be enough to facilitate a coup. It's a bit harder because if, if they're limited, then there's a question of why the existing military doesn't just kind of seize back control after a bit of time.

01:31:46

And so probably that scenario also has to involve things like, you know, maybe the current president, you know, supporting the coup and therefore like pressuring the military not to intervene or some other source of legitimacy for the coup beyond the kind of the, the AI controlled drones.

01:32:08

And then there's also kind of more kind of typical types of backsliding, you know, like, like, like has already been happening in the U S that I think, you know, could be exacerbated through AI enabled surveillance and, you know, AI kind of increasing state capacity in other ways. And again, you know, that, that, that, that backsliding doesn't require, you know, super powerful AI.

01:32:33

You know, you could probably do a lot of monitoring, a lot of kind of content moderation on the internet, a lot of surveillance with today's systems. Um, it doesn't get you all the way to one person having complete control where they can just quash any resistance with a robot army and replace everyone in their job with a, with an AI. And so no one has any leverage.

01:32:56

So I think, you know, to get to that real intense, this is like the most intense form of concentration of power via AI that requires really powerful AI, but to just kind of significantly exacerbate existing trends in, in political backsliding and, you know, to make it easier to do a military coup, I think, you know, more limited systems, um, would suffice. Yeah.

01:33:21

We discussed earlier the possibility of, of one country or one company outgrowing the rest of the world and kind of concentrating power into, into those entities. I know you mentioned one person. Do you think that's actually a plausible scenario in which you have say one CEO of one company being, being the person in, in control of the world via a concentration of power and then a coup? A hundred percent. Yeah.

01:33:48

I mean, the story I told earlier about secret loyalties, you know, meaning that now we backdoored wide range of military systems. I mean, you can seize power. That's, that's one route. And then, you know, again, there's this other route with the company and masses, you know, masses amounts of economic power by kind of having a monopoly on AI cognitive labor. And then, you know, leveraging that, leveraging that to get more economic power, more political influence. Yeah. I mean, I do, I do, I do think it's possible.

01:34:19

You know, you know, again, there's this big shift once AI can fully replace humans where today, no, one person can never have absolute power. They have to rely on others to implement their will. Yeah. I mean, I think it's possible. I think it's possible that there's always a threat of kind of internal revolt or outside factors threatening the dictatorship. But this, this could potentially change. Yeah.

01:34:46

There's always a threat of revolt and then to, to guard against that threat, the dictator needs to share their power to some extent, has to compromise. But yeah, you could get all concentration on one person with sufficiently powerful AI. Do you think we, we move through a period of increased threat of AI enabled coups and then reach some kind of stable state? Or do you imagine that there's a constant kind of risk of AI enabled coups in the future?

01:35:15

I think we move through it. Yeah. It's, it's, it's this point about once we have deployed AI across the whole economy, the government, the military, if those AIs are maintaining balance of power, then we could fully eliminate the risk of an enabled coup.

01:35:33

You know, it would just be as if, you know, our whole population was just, you know, so committed to democracy, would never seek power, you know, never help anyone else who, who, who wanted to undermine any democratic institution. You know, you know, you could, we already have strong norms, you know, favoring democracy, but you know, they're far from perfect and they have been eroded over recent decades, but you could, you could just get rock solid norms.

01:35:56

And then, you know, they're programmed in, you know, they cannot be removed except by the will of the people. I mean, there's a, there's a bit of a question because you still want to give the human population the ability to change the AI's behavior and its rules. So the human population could always choose to move to an autocracy. So I suppose I shouldn't say that we could fully eliminate the risk because, you know, we, we, we could, we could, we could, we will always have that, you know, democracy.

01:36:25

There's always this point that democracy could, you know, vote to stop being a democracy. But I do think, you know, we could get to a point where it absolutely cannot happen without most people wanting it to happen. And so, so you would get to a point in which kind of future AI enhanced societies, you could say, are more stable than, than current democracies.

01:36:49

And they're more, they're less at risk of, of coups than, or democratic backsliding than, than current democracies. Much, much more. Yeah. You could get much more robustness there. I mean, just that there's this constant dynamic in today's societies where people care about democracy, but they also care about a host of other things. You know, their own achievements, various other ideological commitments.

01:37:11

And so, you know, depending on how dynamics play out, depending on how technology evolves and what people's incentives are, sometimes people push against democracy. You know, that that's what the Republican party has been doing in some ways. That's what democratic party has done as it's kind of increasingly put, you know, pretty ideological people in, in powerful institutions.

01:37:33

So with AI, you can, you get much more control over those dynamics because you can just, you know, you can just make it much more the case that, that, that democracy is not being compromised. Are there any ways for us to, are there any kind of risk factors we can look at if we're interested in, in, in predicting coups? Do you think there's something we can, we can measure or something we can track to see whether we are at risk of, of, of an AI enabled coup?

01:38:02

It's a great question. I don't think I have an amazing answer, but some things that come to mind, the, the capabilities gap between top AI labs, and then the gap again with open source. The degree to which AI companies are sharing their capabilities with the public.

01:38:27

And if not with the public, then with, you know, multiple other trusted institutions, you know, like sharing their strategy capabilities with kind of US political parties and parts of government.

01:38:41

And, um, the, the extent of economic concentration, you know, how much, how much, what are the revenues and net worth of particular companies or particular AI companies? So another one, what is the extent of government automation and military automation by AI systems?

01:39:10

And when it, when the automation is happening, how robust are the guardrails against breaking the law and guardrails against other forms of illegitimate power seeking? Um, how much transparency does the public or the judiciary or Congress have into how dangerous AI capabilities are being used by, by AI companies and by the executive branch?

01:39:38

So, you know, take the example of military R and D capabilities that is really smart AIs that can design super powerful weapons. It's scary if companies can just use those military R and D capabilities without anyone knowing. It's also scary if a small group of people from the executive branch can use those capabilities without anyone else knowing how they're using them because they could be designing powerful weapons and making them loyal to a small group.

01:40:05

You know, so transparency in, into these like high stakes capabilities and how they're being used by a, by a broad group. It doesn't have to be public. Um, probably shouldn't be public, but you know, we have checks and balances already. So, you know, another, another kind of question is as, as, as these high stakes use cases start occurring or they become possible. Do, do we know that there's transparency requirements in place?

01:40:32

You know, as we, as we increasingly see AI companies contracting with Palantir and other military contractors, we can kind of begin to see that they're making increasingly powerful, um, weapons. Is there, is there, is there a process of oversight? Do we know that if someone was trying to, you know, make AI military systems, the law to them, that it would be spotted that that that that's another indicator.

01:40:57

We, we can look at, you can look at, you know, all the kind of standard democratic resilience indicators that kind of the social scientists have come up with this various things about kind of free and fair elections about civil society. About freedom of press that, you know, have been getting worse recently in the US. Um, but there's, there's various indicators here.

01:41:21

You can look at the degree of government censorship of, of, of, of, of a freedom of speech or what's on the internet. And the degree of surveillance that the government's doing. If you, if you take all of these things into account, yeah, how do you think about the, the risk of an, of an AI enabled coup? What's the risk of an AI enabled you in the next, you know, 30 years? The next 30 years.

01:41:50

I think it's high. I think the risk is high. I would guess it's 10% or something. And that, you know, to be clear, you know, if, if it was just existing political trends, ignoring AI, I'd be, you know, maybe a few percentage, maybe on like 2% or something. You know, there's definitely a risk of that. And I'm thinking about the US here.

01:42:13

I think a big, a big part of my, my current worries are not about the indicators, but it's about my expectation that AI capabilities will keep, will keep increasing quickly and even more quickly. And then the kind of absolute lack of interest in regulating AI companies right now in the US.

01:42:39

And the, the difficulty that we will have of constraining the executive under the current situation where, you know, present is using, you know, sophisticated legal strategies to increase their own power and is, is no succeeding on many fronts. You know, the US is, is not doing a great job at constraining the executive. So, you know, companies are unconstrained. The executive is poorly constrained. Those are the key threat actors here.

01:43:07

So, you know, with fast AI capabilities progress, plus that lack of constraint, lack of transparency, you know, the default is that a lot of those indicators I said get worse and, you know, none of the indicators get better like transparency. And so that, that, that makes me think this is, this is very plausible. Yeah. I mentioned 30 years, but what about five years? Five years. That's tough, isn't it? It's really tough. I mean, yeah, I think there's a risk.

01:43:36

I, I wouldn't think there was a risk if it wasn't for the AI research causing intelligence explosion angle, but AIs are a lot better at coding and cognitive kind of research related tasks than they are at, you know, for example, you know, controlling robots and stuff.

01:43:57

And so even if the third model ultimately comes through robots or comes to, you know, crazy levels of persuasion, it's just, you, you really can't rule out a scenario where, yeah, AI research is, is, is automated in three years time. Then in four years time, we've got super intelligent AI controlled by a few, few people. Maybe it's got secret loyalties. Maybe it's being deployed in the government and being made overtly loyal to the president.

01:44:22

And then, you know, a year later, you know, it's, it's backsliding or it's political capture or it's, you know, robot soldiers. Yeah. How do you think about the badness of the outcomes here? How, how much does the badness depend on the ideologies of the people who are, who are conducting the coup or what should we look out for?

01:44:47

If, because I mean, I guess we can rank cues by badness, which is not an exercise. I think we should actually attempt, but we can, we can kind of talk about the factors involved about what would be the worst kind of coup and what would be a slightly better kind of coup, slightly less bad kind of coup. Yeah. So we could, you know, let's imagine it's one person that sees power. Actually, you know, that's the first distinction to draw.

01:45:15

If there's a group, then even 10 people is better than one person. And why? And why is that? Yeah. So 10 people, you get a diversity of perspectives. So more kind of moral views represented and there's more kind of room for compromise between those perspectives. There's more room for kind of reasonable positions to win out as there's kind of some, some deliberation as, as actions are decided upon.

01:45:41

There's slightly less intense selection for psychopaths than if it was just one person. So, yeah, if it's just one person, that's bad. That's particularly bad. 10 people are still very bad. You know, a hundred people still pretty bad, but you know, it's the, the, the, there's big differences there. Big differences. If, if, if we're now just thinking about, you know, one person or the average person in the group, then we could think about how competent they are.

01:46:09

And then we could say something about kind of how, how kind of virtuous their motivations are. Um, well, I, I do think competency is important. Like I think it's probably underrated in most political discussions. How important it is to just be really, really, really competent.

01:46:26

You know, thinking about something like responding to COVID or thinking about something like, you know, trying to deescalate a conflict, Russia, Ukraine, or trying to deescalate Israel conflict. Like actually just being very competent and very good at getting things done is important.

01:46:46

And as we, as we mentioned, if you're just willing to rely on AIs and you, you know, align those AIs in the right way, anyone could be really competent, but you know, that's not guaranteed. You know, people, you know, may, may really want to, to cling to their current views without, without changing their minds. You know, let's, let's take the example of Donald Trump. If a really smart AI system told him, look, tariffs are definitely bad for the US economy. They're definitely bad and won't give you what you want.

01:47:16

Would, would he change his mind? I would, I would guess no. So, you know, lots of, lots of smart people have already been saying that and, you know, him and his supporters. You know, I, I don't actually know like the economic details here, but like my understanding is that most people think that they're pretty bad. And, you know, it'll still be the case that, you know, Trump will be able to find people telling him that what he thinks is good and he'll be able to program his AIs to keep telling him that if he wants to.

01:47:43

So there's no guarantee that, that, that he will become super competent or that whoever sees power becomes super competent. So there's, there's this kind of like, there's a, there's a form of loyalty that actually undermines competence just because you're, you're not, you're loyal to such an extent that you're not providing feedback that's, that's useful because, you know, negative feedback feels bad to receive. And so there's, there's, there's, there's that kind of loyalty.

01:48:08

I mean, maybe this is a bit contrived, but do you think there's a sense in which in the singular loyalty scenarios, the AIs could be so loyal that they are, they're kind of undermining the competence of, of the person that is, that, that they're singularly loyal to?

01:48:26

Yeah, it's a really great question. I haven't thought about this, but yeah, in a way, the most extreme version of singular loyalties will just agree with whatever the most recent thing that, you know, the, the, the dictator has said, it's like, you know, it's a version of sycophancy, which we already see without questioning. And we'll do that even when it's not in that person's interests because that's the kind of type of loyalty that's demanded where there's a more kind of sophisticated type of loyalty where there's a more kind of sophisticated type of loyalty.

01:48:55

where you're still completely loyal, but you're also willing to challenge them when, when you think it's in their, in their best interests. So that, that's a really nice distinction. And yeah, I suppose one way of, you know, thinking about competence is thinking about what kinds of loyalties the dictator would demand from their AI systems.

01:49:19

Another, another, another way of thinking about it is how much they would listen to their AI advisor. Like even if the AI has the kind of sophisticated type of loyalty and it's trying to tell the dictator what to do, the dictator could just ignore them. And, you know, you see that again, you know, AI is a fairly sycophantic, they will also challenge you sometimes. And then it, you know, it's up to you whether you listen.

01:49:40

So that's what, that's what, that's what the confidence bucket, which I think is really important. And I do think there are differences between potential coup instigators on that, on that front, which, which could be significant. Yeah, I guess my expectation would be that kind of lab CEO coups would be, would be more competent than, than heads of state.

01:50:03

But, you know, even, even within lab CEOs, there are some that are more dogmatic than others. And I think that dogma would get in the way of competence. That's competence. And the other thing I mentioned was kind of, kind of broadly, what are your goals, what are your values or kind of more character?

01:50:22

And here, you know, what I, one thing I think is really important is being open minded, being willing to bring in lots of different, you know, diverse perspectives into the discussion and empower them to, you know, really represent themselves and grow and flourish. So I think a very bad thing would be, you know, a particular person becomes dictator, they implement their vision for society, end of much better to be, you know, empower all the different kind of,

01:50:50

ideologies and ideas that kind of, you know, become the best versions of themselves. And then, you know, we can kind of collectively grow and improve our understanding of how to, how to run society. So, you know, sometimes people focus when they're thinking about values on like, okay, are you, are you at this type of utilitarian or, oh, no, I hope you're not, you know, the ontologist or, you know, it can get very kind of specific and finger pointing.

01:51:18

You know, my view is more that, you know, we don't really know what the right answer is. And the most important thing is, is, is being pluralistic. And, you know, letting a thousand flowers bloom. Hmm. So we discussed the possibility of getting to a stable state in which we've avoided an AI enabled coup. And now we have, say, we have aligned super intelligence kind of that where the risk of coup is very low.

01:51:45

Do you think this is something that happens for one country and then that one country is, is, is in control of the world to such an extent that it's, that it's, this is not a process that other countries are undergoing.

01:51:58

To be more concrete here, for example, if the U.S. goes through a risk of, of Q, of AI enabled coups, but managed to, to kind of stay, to, to remain a stable democracy, is it the case that Russia or China will go through a similar period of risk of, of, of coups? It's, it's a great question. And it will depend on the U.S.'s kind of posture towards the rest of the world geopolitically.

01:52:28

And it will also depend on, you know, whether the U.S. has gained a huge military and economic advantage, you know, like outgrowing the world or just developing powerful military technology, as we were discussing previously. But, you know, you can imagine one scenario where, you know, the U.S. isn't that, that much more powerful than the rest of the world yet.

01:52:52

And isn't that like kind of inclined to, to intervene, which has been kind of the recent trend. And then, you know, China developed some really powerful AI a few years later and using Ping uses it to cement his control over China. So then you now have one kind of AI enabled dictatorship that is extremely robust. And then you have the kind of U.S. which, you know, has, has avoided that risk.

01:53:22

And now they're kind of, you know, maybe they're competing against each other and kind of, you know, the Cold War three, I'm trying to kind of, Cold War two, sorry, trying to outgrow the world or maybe, maybe they're striking deals because, you know, they, they recognize this, it's, it's, it's not good to compete. And, you know, China just kind of indefinitely remains a dictatorship.

01:53:48

And, you know, that, that's just a permanent loss for, for the world. But, but you could also imagine a different scenario where the U.S. is very far ahead and maybe, you know, it just wants to really secure its position geopolitically. And so it, you know, it instigates and enable coups in other nations where it's really putting kind of U.S. representatives up on top of those, those nations that could be through secret loyalties.

01:54:16

It could sell systems to, you know, sell, sell AI systems, let's say to India that are secretly loyal to, to U.S. interests, or it could give, give, give some particular politicians in India access, exclusive access to super intelligent AI to help, help them gain power.

01:54:34

So you could, you know, you could apply those same threat models we've discussed, but with the kind of U.S. playing the strings, or, or you could have the U.S. just kind of taking control of other nations in more traditional ways. You know, you know, just military conquest and kind of really leaning heavily on kind of extracting economic value out of other countries as they are growing the world.

01:54:59

So, you know, yeah, kind of wide range of options here, really. Yeah, yeah. As a, as a final topic here, perhaps we can talk about what listeners can do if they want to help try to prevent AI enabled coups. And specifically where to position themselves. Should, should they be in AI companies? Should they be in governments? Should they be in perhaps eval organizations? Where's, where, where is the position of most leverage?

01:55:30

Great question. I think being at lab is a great place to be. I talked about system integrity, kind of robustly ensuring that AI's don't have secret loyalties and behaviors intended. That's something that companies need to implement.

01:55:44

So if you have interest or expertise in sleeper agents or backdoors to AI models or cybersecurity, then I think being part of lab and helping them achieve system integrity is an amazing way to reduce this risk. Another thing you can do at labs.

01:56:09

If you're, you know, if you're interested in, if you're, if you're worried about kind of the risk of, you know, government, you know, heads of state deploying loyal AI's and seizing power is you can help labs develop. And then you can help. You can help them with terms of service where when they sell their AI systems to governments, they have certain mitigations against misuse.

01:56:28

Um, that maybe, you know, one way to frame this is look, we're, we're, we're using really powerful AI's and we can't guarantee the safety of those AI systems unless we have some degree of monitoring.

01:56:40

To ensure that the AI systems aren't doing anything unintended, that monitoring could then be sufficient to allow for the prevention of, of, of, of, of coups because you're being monitoring not only for kind of accidental misaligned AI behavior, but, you know, that will also thereby mean you're monitoring for, you know, a bad human actor giving them illegal instructions.

01:57:05

So that, that, you know, labs will be drawing up contracts with governments, terms of service. They will be thinking about, you know, the, the guardrails, if any, that go on a place on the systems that they, that they sell to governments. But I think, you know, there's very careful work to be done thinking through, okay, how can we structure those guardrails? How can we explain them in a way which is, you know, very unarguable and that doesn't seem like we're kind of trying to like, you know, constrain the government, you know, private companies don't, you know,

01:57:34

it's not really legitimate for them to kind of constrain the government, but, you know, I do think there's an important, an important thing to be done here in preventing air-enabled coups. So kind of threading that needle. There's another thing you could do in a government, but you could, in a lab, but you could also do that kind of work for a think tank or for a research organization that's kind of interlinked with government like, like RAND, I think could potentially do some of this kind of work thinking about, you know, what should, you know, be, be in the terms of service.

01:58:04

I think there's a lot of services between labs and governments. Let me think. I think another big thing is that for this system integrity, yes, we want labs to implement it, but we also want there to be some external organization that can certify that. Currently, no external organization is working on this. You know, META is not working on it. Apollo is not working on it. I don't think any evaluations organization is.

01:58:33

But there will be at one point in time, a very strong demand for, for, for guarantees that the systems that these companies have developed don't, don't have SEPA agents. Because if you're deploying a system in, you know, critical infrastructure and, you know, important government roles in the military, you want to know that it does what it says on the tin.

01:58:55

And so I think that, you know, a company can make a wise investment by developing that capability, the ability to understand how to insert secret loyalties, how to detect them, being able to assess a company's process to see whether it's sufficient. How, how, how, how close would that work be to kind of interpretability work? Isn't, isn't there overlap there kind of like looking into what is the system actually doing?

01:59:23

You know, fine, kind of reverse engineering some, some process to see how things, how things work. Uh, is detecting sleeper agent like a specific constraint problem? Yeah, it's a great question. It's a specific type of problem. So you could use internals work like interpretability to detect steeper agents. That's not the only method you could use. You could use behavioral tests. You could look at the training data.

01:59:53

I think a mixture of those is useful. It's also, it's a specific adversarial setup. So interpretability, we're just often interested in all kinds of kind of facts about the model's behavior. And its cognition. With, with secret loyalties, we're, we're specifically concerned with one red team trying to produce a secret loyalty and then a blue team trying to detect it.

02:00:18

And so it's, it's adversarial in that, you know, the red, the blue team has a tough task. The red team will be purposely trying to hide the sleeper agent. And so that, that, that, you know, that will raise a whole host of new considerations and important things to dig into. So you can imagine being like allowing the red team, you know, full control over all aspects of the development process. Can they create a secret loyalty? Maybe that maybe they can, maybe it's pretty hard to test. Okay.

02:00:45

But now you only allow them to intervene at one part of the development process, or you allow the blue team to look at all the data that they use for training, or you allow the blue team to look at all of the code that they wrote. And you could get potentially quite a sophisticated understanding of the conditions under which the blue team win conditions under which the red team win. This research doesn't need to be done in a lab. It could be done by an external organization. I think it's, you know, big, big missing, missing, missing focus of, of, of today's technical work.

02:01:15

Um, and ultimately that, that would inform the assessments of, of, of the labs attempts to do system integrity. So for technical researchers out there, I'd, I'd, I'd, I'd really highlight that possibility.

02:01:30

Another, another, another kind of piece of work for, for the right person would be beginning to understand the kind of existing military thinking around autonomous, autonomous systems that this is already obviously a live issue for militaries. They are increasingly deploying AI.

02:01:49

It would be nice to marry up that existing expertise with these kind of risks about more powerful systems enabling coups and kind of get to a consensus within that military community of basic principles like law following, like distributed control over military systems.

02:02:11

And, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, if there's anyone listening that, that hasn't, that has a way in, I think, I think that's, that's potentially pretty, pretty valuable. Although there's also a risk of poisoning the well if it's done badly. So, you know, see with some, with some care. Yeah. Yeah. Perfect. Thanks for chatting with me, Tom. It's, it's been great.

02:02:41

Yeah. Real pleasure. Thanks so much, Gus. If you're finding value in the show, we'd appreciate it. If you'd take a moment to share it with friends, post online, write a review on Apple podcasts or Spotify, or just leave us a comment on YouTube. Of course, we always welcome your feedback, guest and topic suggestions and sponsorship inquiries, either via our website, cognitive revolution.ai or by DMing me on your favorite social network.

02:03:05

The cognitive revolution is part of the Turpentine network, a network of podcasts where experts talk technology, business, economics, geopolitics, culture, and more, which is now a part of A16Z. We're produced by AI podcasting. If you're looking for podcast production help for everything from the moment you stop recording to the moment your audience starts listening, check them out and see my endorsement at AI podcast.ing.

02:03:30

And finally, I encourage you to take a moment to check out our new and improved show notes, which were created automatically by Notions AI meeting notes. AI meeting notes captures every detail and breaks down complex concepts. So no idea gets lost. And because AI meeting notes lives right in Notion, everything you capture, whether that's meetings, podcasts, interviews, or conversations lives exactly where you plan, build, and get things done. No switching, no slowdown.

02:03:58

Check out Notions AI meeting notes if you want perfect notes that write themselves. And head to the link in our show notes to try Notions AI meeting notes free for 30 days.

allenday/What-if-Humans-Weaponize-Superintelligence.md