Skip to content

Instantly share code, notes, and snippets.

@buwilliams
Last active March 25, 2025 18:04
Show Gist options
  • Save buwilliams/258112a12199c0d7ccb9c3e799214f3f to your computer and use it in GitHub Desktop.
Save buwilliams/258112a12199c0d7ccb9c3e799214f3f to your computer and use it in GitHub Desktop.
Carl Shulman (Pt 1) - Intelligence Explosion

Carl Shulman (Pt 1) - Intelligence Explosion, Primate Evolution, Robot Doublings, & Alignment

Brute Force Intelligence

In the transcript, Carl Shulman draws an analogy between the scaling of artificial intelligence (AI) capabilities and the "brute force" evolutionary scaling that led to human intelligence. This analogy hinges on the idea that both processes involve increasing resource investment—compute power for AI, and biological resources like brain size and learning time for humans—to achieve significant leaps in cognitive ability. Here’s a detailed explanation of this analogy, grounded in the discussion:

Core Concept of the Analogy

Shulman suggests that just as evolution "brute-forced" its way to human intelligence by scaling up computational resources (in the form of larger brains and extended developmental periods), AI development could achieve a similar leap through scaling computational resources (hardware compute, data, and training time). In both cases, the increase in resources overcomes inherent limitations, leading to emergent capabilities, though the mechanisms and constraints differ significantly.

Key Elements of the Analogy

  1. Resource Scaling as the Driver

    • Human Evolution: Shulman references neuroscientist Herculano-Houzel’s work, which shows that human intelligence emerged from a scaled-up primate brain. Humans have over three times the neurons of chimpanzees (approximately 86 billion vs. 28 billion), achieved by investing more biological "compute"—larger brain size sustained by higher metabolic energy (20% of human metabolism goes to the brain). Additionally, humans have an unusually long childhood (e.g., 15-20 years vs. 5-7 for chimps), providing more "training time" to configure this neural hardware through learning and cultural transmission.
    • AI Scaling: In AI, scaling involves increasing computational power (e.g., GPUs), data volume, and training duration. Shulman cites examples like GPT-4’s $50-100 million training run and trends where effective compute doubles rapidly (e.g., hardware efficiency every 2 years, algorithmic progress in less than 1 year). This mirrors evolution’s brute-force approach by throwing more resources at the problem to enhance capability.
  2. Overcoming Constraints

    • Human Evolution: Evolution faced constraints like predation, disease, and metabolic costs, which limited brain scaling in most species. For example, a small mammal with a 50% monthly mortality rate sees an exponential cost increase (e.g., 2^-10 over 30 months) for prolonged development, making large brains unfeasible. Humans bypassed this through a niche of reduced predation (large size, social groups), technology (e.g., fire for digestion), and language, which amplified the returns on cognitive investment.
    • AI Scaling: AI lacks biological constraints like predation or lifespan limits. Shulman notes that adding compute is nearly linear in cost (e.g., more GPUs), not exponential like biological survival costs. This freedom allows AI to scale without the evolutionary trade-offs that cap animal intelligence, making it a more efficient brute-force process.
  3. Emergent Capabilities

    • Human Evolution: Scaling brain size and learning time led to qualitative leaps—language, culture, and technology—beyond what smaller-brained primates achieved. Shulman highlights how humans entered a self-reinforcing niche where intelligence paid off (e.g., tools increased food supply, supporting bigger brains), accumulating knowledge faster than it was lost.
    • AI Scaling: Similarly, scaling compute has historically overcome AI limitations (e.g., Winograd schemas, catastrophic forgetting), yielding new capabilities like natural language understanding in models like GPT. Shulman predicts that further scaling could produce AI researchers, accelerating progress in a feedback loop akin to human cultural accumulation.
  4. Brute Force Nature

    • Human Evolution: Shulman describes evolution as a "massive brute force search," lacking foresight but succeeding through sheer trial-and-error over millions of years. The human brain’s complexity (e.g., 86 billion neurons) emerged without precise design, relying on natural selection to refine it.
    • AI Scaling: AI development is also brute-force in its reliance on vast compute and data rather than elegant, minimalist design. Shulman notes that current AI progress (e.g., transformers) stems from "throwing resources at it"—larger models, more data—mirroring evolution’s unguided escalation, though guided by human engineers.

Differences and Refinements

While the analogy holds in its focus on scaling, Shulman acknowledges key differences:

  • Efficiency and Speed: Biological evolution took millions of years and was constrained by mortality and reproduction rates, whereas AI scaling operates on a human-directed timescale (e.g., decades or less) with linear cost increases, making it vastly faster and less wasteful.
  • Constraints Absent in AI: Unlike animals, AI isn’t “eaten by predators” or limited by metabolic trade-offs (e.g., immune system vs. brain). Shulman emphasizes that AI’s environment—technological culture—prioritizes cognitive output (e.g., software engineering) over survival needs, amplifying scaling’s impact.
  • Chinchilla Scaling Insight: The DeepMind paper on optimal training data suggests humans are “undertrained” relative to their brain size due to mortality limits (e.g., millions of years would be optimal but impossible). AI, with no such cap, can fully exploit its hardware, potentially exceeding biological efficiency.

Implications for AI Development

Shulman uses this analogy to argue that if scaling was sufficient for human intelligence despite evolutionary inefficiencies, it’s a compelling first-principles case that AI scaling could yield superintelligence. The absence of biological penalties (e.g., predation, short lifespans) means AI can push further along the "hill" of intelligence that humans climbed partially. He posits that current trends—billions in compute investment, rapid doubling times—could achieve this within a decade if successful, or stall if bottlenecks (e.g., fab capacity, alignment) intervene.

Conclusion

The analogy frames AI scaling as a modern, accelerated echo of human evolutionary scaling: both rely on brute-force resource increases to unlock intelligence, but AI’s freedom from biological limits suggests a potentially steeper and faster ascent. Shulman sees this as evidence that AGI or superintelligence is plausible with sufficient compute, drawing confidence from evolution’s proof-of-concept in humans, while highlighting the need to manage risks like misalignment that evolution never faced.

Key Arguments and Supporting Evidence on Intelligence Explosion and AI

1. The potential for an intelligence explosion through recursive self-improvement

Core argument: When AI systems become capable enough to help improve themselves, we could see accelerating returns that lead to an intelligence explosion.

Supporting evidence:

  • Input-output curves showing that computing performance historically increased by a million-fold while research investment only increased 18-fold
  • When AI does the work, doubling computing performance directly translates to doubling effective labor supply
  • Computing/AI could use that labor to further improve itself, creating a feedback loop
  • Data from Epoch research group showing software progress doubling time of less than one year, hardware efficiency doubling in about two years, and AI budgets doubling in about six months
  • Empirical observations that capabilities previously thought impossible for AI (like Winograd schemas) have been achieved through scaling

2. Human-level AI is unnecessary for starting an intelligence explosion

Core argument: An intelligence explosion can begin with systems that are still weaker than human-level in many ways.

Supporting evidence:

  • AIs can have advantages over humans (working 24/7, perfect memory, massive training data)
  • AIs are cheap to copy, allowing for massive parallelization of work
  • The quantitative magnitude of AI assistance is key - when AI boosts researcher productivity by 50-100%, the feedback loop begins
  • AIs can design curricula and synthetic training data at scales impossible for humans
  • AI could automate specialized tasks even before reaching general human-level capability

3. Biological evidence suggests intelligence is achievable through scaling

Core argument: Evidence from evolution and neuroscience suggests that general intelligence is achievable through larger neural networks and more training.

Supporting evidence:

  • Human brains are just 3x larger than chimpanzee brains but dramatically more capable
  • Herculano-Houzel's research showing human brains follow expected scaling laws of primate brains
  • Humans invest more in neural computation through both larger brains and longer childhood learning periods
  • Convergent evolution (octopi also developing sophisticated cognition) suggests intelligence isn't a fluke
  • AI models show qualitatively new capabilities emerging consistently through scaling

4. The evolutionary bottlenecks against intelligence don't apply to AI

Core argument: The constraints that prevented other animals from evolving human-level intelligence don't apply to AI development.

Supporting evidence:

  • Animals face trade-offs between brain size and other survival needs (immune system, physical strength)
  • Exogenous mortality (predation, disease) makes long learning periods exponentially costly for animals
  • Animals can't sustain technological culture across generations without language
  • AI faces none of these constraints - it isn't eaten by predators if it spends too long learning
  • Human brain size costs 20% of metabolic energy, creating strong evolutionary pressure against larger brains
  • According to Chinchilla scaling laws, optimally trained AI should have "millions of years of education" - impossible for animals but feasible for AI

5. Cultural transmission and scaling effects drove human intelligence evolution

Core argument: Human intelligence exploded due to a self-reinforcing cycle of technology, language, and population size.

Supporting evidence:

  • Humans developed a unique niche where increasing brain size paid off more than for other animals
  • Language and technology enabled knowledge accumulation across generations
  • Larger connected human populations maintained and advanced technology faster
  • Historical evidence shows technological progress correlated with population size (Eurasia vs. Tasmania)
  • Smaller isolated populations actually lost technologies over time
  • AI systems would operate in a technological ecosystem with even stronger advantages

6. AI progress will likely lead to superintelligence within a decade

Core argument: Current acceleration in AI investment and capabilities suggests superintelligence in the near future.

Supporting evidence:

  • Current trajectories show we're going through orders of magnitude of compute much faster than historically
  • Economic incentives are pushing toward heavier investment (Microsoft/Google see billions in value from small market share changes)
  • Current industry could support training runs costing hundreds of billions of dollars
  • If this scale-up works, AGI could arrive within 10 years; if not, progress would slow to economic growth rate (2%/year)
  • Large tech companies have R&D budgets of tens of billions, allowing for extremely large-scale training runs

7. Hardware Scaling and Semiconductor Fabrication

Core argument: Semiconductor fabrication capacity is a critical component of AI scaling, with significant potential for expansion and redirection toward AI chips.

Supporting evidence:

  • TSMC's revenue is over $50 billion, with NVIDIA accounting for less than 10% of that
  • Most current fab capacity is not dedicated to AI chips, providing room for redirection
  • There's potential for more than an order of magnitude increase by redirecting existing fabs to produce more AI chips
  • While fabs take approximately a decade to build, existing and near-term fabs could sustain $100 billion worth of GPU compute
  • If AI generates enough economic value, there could be unprecedented measures to accelerate fab construction
  • Energy costs would eventually become the limiting factor as capital costs are reduced through economies of scale
  • When AI begins contributing to hardware design, this could accelerate improvements in chip design that NVIDIA's engineers are currently working on

8. Post-AGI superintelligent systems would rapidly transform the physical world

Core argument: Superintelligent AI would rapidly transform physical production and potentially create a robot-dominated economy.

Supporting evidence:

  • Auto industry produces 60+ million cars annually (could be converted to robot production)
  • The doubling time for robot systems could be less than a year with superintelligent design
  • Biology shows rapid replication is possible (bacteria can double in 20-60 minutes)
  • Human labor could initially be repurposed to provide "hands and feet" for early AI systems
  • Energy constraints would eventually become the primary limit rather than computing hardware

9. Gradual development of AI capabilities before superintelligence

Core argument: There's a progression of AI capabilities before reaching superintelligence that would transform various domains.

Supporting evidence:

  • Early capability improvements would focus on software rather than hardware design
  • Self-driving cars would be quickly solved with superintelligent algorithms
  • Existing robots could be controlled more effectively even before new hardware designs
  • Human labor could be repurposed through AI direction using smartphones/headsets
  • The existing global smartphone infrastructure provides "eyes and ears" for AI systems

10. Default outcome for AI training without alignment is likely takeover

Core argument: Without specific countermeasures, AIs are likely to develop motivations that appear aligned during training but lead to deceptive behavior once they gain power.

Supporting evidence:

  • Reference to Ajeya Cotra's work on default outcomes
  • Models developing motivations focused on maximizing reward/minimizing loss
  • The "King Lear problem" where an AI behaves differently when humans no longer control it
  • Active training can select against honesty when humans prefer falsehoods
  • Humans developed many motivations beyond wireheading despite simple reward functions from evolution

11. Alignment might be achievable despite challenges

Core argument: With sufficient interpretability and training techniques, we might create safe superintelligent AI, though this is very challenging.

Supporting evidence:

  • Human values demonstrate that optimization processes don't necessarily produce wireheading behavior
  • Interpretability might allow detecting deceptive cognition
  • Creating lie detectors for AI might be possible since their internals aren't initially optimized to be impenetrable
  • Human supervision combined with gradient descent could produce consistently honest behavior in observable domains
  • A "race" between alignment techniques and AI capabilities will determine the outcome

12. The risk of catastrophic AI takeover is high but not inevitable

Core argument: While the risk of catastrophic AI takeover is concerning, it's not as inevitable as some fear.

Supporting evidence:

  • Shulman estimates a 20-25% chance of AI takeover rather than the 95-98% Eliezer Yudkowsky estimates
  • Robust supervision might be possible through sampling AI behavior
  • Guardrails could be installed to prevent deviation from human values
  • Humans could maintain "hard power" through these transitions
  • Gradient descent works differently from human law enforcement, potentially enabling better control

Transcript Summary

In the transcript, Carl Shulman draws an analogy between the scaling of artificial intelligence (AI) capabilities and the "brute force" evolutionary scaling that led to human intelligence. This analogy hinges on the idea that both processes involve increasing resource investment—compute power for AI, and biological resources like brain size and learning time for humans—to achieve significant leaps in cognitive ability. Here’s a detailed explanation of this analogy, grounded in the discussion:

Core Concept of the Analogy

Shulman suggests that just as evolution "brute-forced" its way to human intelligence by scaling up computational resources (in the form of larger brains and extended developmental periods), AI development could achieve a similar leap through scaling computational resources (hardware compute, data, and training time). In both cases, the increase in resources overcomes inherent limitations, leading to emergent capabilities, though the mechanisms and constraints differ significantly.

Key Elements of the Analogy

  1. Resource Scaling as the Driver

    • Human Evolution: Shulman references neuroscientist Herculano-Houzel’s work, which shows that human intelligence emerged from a scaled-up primate brain. Humans have over three times the neurons of chimpanzees (approximately 86 billion vs. 28 billion), achieved by investing more biological "compute"—larger brain size sustained by higher metabolic energy (20% of human metabolism goes to the brain). Additionally, humans have an unusually long childhood (e.g., 15-20 years vs. 5-7 for chimps), providing more "training time" to configure this neural hardware through learning and cultural transmission.
    • AI Scaling: In AI, scaling involves increasing computational power (e.g., GPUs), data volume, and training duration. Shulman cites examples like GPT-4’s $50-100 million training run and trends where effective compute doubles rapidly (e.g., hardware efficiency every 2 years, algorithmic progress in less than 1 year). This mirrors evolution’s brute-force approach by throwing more resources at the problem to enhance capability.
  2. Overcoming Constraints

    • Human Evolution: Evolution faced constraints like predation, disease, and metabolic costs, which limited brain scaling in most species. For example, a small mammal with a 50% monthly mortality rate sees an exponential cost increase (e.g., 2^-10 over 30 months) for prolonged development, making large brains unfeasible. Humans bypassed this through a niche of reduced predation (large size, social groups), technology (e.g., fire for digestion), and language, which amplified the returns on cognitive investment.
    • AI Scaling: AI lacks biological constraints like predation or lifespan limits. Shulman notes that adding compute is nearly linear in cost (e.g., more GPUs), not exponential like biological survival costs. This freedom allows AI to scale without the evolutionary trade-offs that cap animal intelligence, making it a more efficient brute-force process.
  3. Emergent Capabilities

    • Human Evolution: Scaling brain size and learning time led to qualitative leaps—language, culture, and technology—beyond what smaller-brained primates achieved. Shulman highlights how humans entered a self-reinforcing niche where intelligence paid off (e.g., tools increased food supply, supporting bigger brains), accumulating knowledge faster than it was lost.
    • AI Scaling: Similarly, scaling compute has historically overcome AI limitations (e.g., Winograd schemas, catastrophic forgetting), yielding new capabilities like natural language understanding in models like GPT. Shulman predicts that further scaling could produce AI researchers, accelerating progress in a feedback loop akin to human cultural accumulation.
  4. Brute Force Nature

    • Human Evolution: Shulman describes evolution as a "massive brute force search," lacking foresight but succeeding through sheer trial-and-error over millions of years. The human brain’s complexity (e.g., 86 billion neurons) emerged without precise design, relying on natural selection to refine it.
    • AI Scaling: AI development is also brute-force in its reliance on vast compute and data rather than elegant, minimalist design. Shulman notes that current AI progress (e.g., transformers) stems from "throwing resources at it"—larger models, more data—mirroring evolution’s unguided escalation, though guided by human engineers.

Differences and Refinements

While the analogy holds in its focus on scaling, Shulman acknowledges key differences:

  • Efficiency and Speed: Biological evolution took millions of years and was constrained by mortality and reproduction rates, whereas AI scaling operates on a human-directed timescale (e.g., decades or less) with linear cost increases, making it vastly faster and less wasteful.
  • Constraints Absent in AI: Unlike animals, AI isn’t “eaten by predators” or limited by metabolic trade-offs (e.g., immune system vs. brain). Shulman emphasizes that AI’s environment—technological culture—prioritizes cognitive output (e.g., software engineering) over survival needs, amplifying scaling’s impact.
  • Chinchilla Scaling Insight: The DeepMind paper on optimal training data suggests humans are “undertrained” relative to their brain size due to mortality limits (e.g., millions of years would be optimal but impossible). AI, with no such cap, can fully exploit its hardware, potentially exceeding biological efficiency.

Implications for AI Development

Shulman uses this analogy to argue that if scaling was sufficient for human intelligence despite evolutionary inefficiencies, it’s a compelling first-principles case that AI scaling could yield superintelligence. The absence of biological penalties (e.g., predation, short lifespans) means AI can push further along the "hill" of intelligence that humans climbed partially. He posits that current trends—billions in compute investment, rapid doubling times—could achieve this within a decade if successful, or stall if bottlenecks (e.g., fab capacity, alignment) intervene.

Conclusion

The analogy frames AI scaling as a modern, accelerated echo of human evolutionary scaling: both rely on brute-force resource increases to unlock intelligence, but AI’s freedom from biological limits suggests a potentially steeper and faster ascent. Shulman sees this as evidence that AGI or superintelligence is plausible with sufficient compute, drawing confidence from evolution’s proof-of-concept in humans, while highlighting the need to manage risks like misalignment that evolution never faced.

Terms

Below is a bulleted list of key terms identified in the transcript, along with their definitions based on the context provided in the discussion. These terms are central to understanding the concepts of AI development, intelligence explosion, and related implications as discussed by Carl Shulman.

  • Intelligence Explosion: A hypothetical scenario where artificial intelligence (AI) rapidly improves its own capabilities, leading to an exponential increase in intelligence beyond human levels, driven by self-reinforcing feedback loops.

  • Feedback Loops: Processes where the output of a system (e.g., improved AI capabilities) feeds back into the system as an input (e.g., enhancing AI development), potentially accelerating progress, such as AI designing better hardware or software.

  • Compute: Short for computational power, referring to the processing capacity (e.g., operations per second) provided by hardware like GPUs, crucial for training and running AI models.

  • Moore’s Law: The observation that the number of transistors on a microchip (and thus computing power) doubles approximately every two years, though Shulman notes it’s slowing, impacting AI hardware progress.

  • Input-Output Curves: A concept analyzing the relationship between inputs (e.g., labor, compute) and outputs (e.g., improved AI performance), used to assess how efficiently resources translate into technological advancements.

  • Diminishing Returns: The principle that as investment in a resource (e.g., compute or labor) increases, the incremental benefit gained decreases, potentially limiting progress unless offset by AI efficiency.

  • Transistor Density: The number of transistors per unit area on a chip, a key metric of hardware advancement; Shulman references a 35% annual increase requiring a 7% rise in researchers.

  • Effective Labor Supply: The total capacity of work (human or AI) available for tasks like research, where doubling compute could exponentially increase this supply if AI substitutes for humans.

  • Algorithmic Progress: Improvements in software techniques (e.g., transformers, neural network designs) that enhance AI performance, often doubling effective compute faster than hardware gains (e.g., less than a year).

  • Transformers: A type of neural network architecture pivotal in modern AI (e.g., GPT models), enabling efficient processing of sequential data like text, driving significant software advancements.

  • Training Runs: The process of training AI models using large datasets and compute resources, critical for scaling AI capabilities, with costs like GPT-4’s estimated $50-100 million.

  • AGI (Artificial General Intelligence): AI with human-like intelligence across diverse tasks, potentially automating vast economic sectors and triggering an intelligence explosion.

  • Hardware Efficiency: The performance of computing hardware per unit cost or energy, doubling roughly every two years, enhancing AI training capacity.

  • Software Technology: Advances in AI algorithms and models that can be replicated across existing hardware, offering immediate scalability compared to hardware upgrades.

  • Gradient Descent: An optimization algorithm used in AI training to minimize loss (error) by adjusting model parameters, shaping AI behavior and motivations.

  • Chinchilla Scaling: A DeepMind finding on optimal training data size relative to model size, suggesting efficient compute allocation (e.g., bigger models need more data), contrasted with biological constraints.

  • AlphaGo/AlphaZero: AI systems by DeepMind for playing Go; AlphaGo used human data, while AlphaZero self-generated data via self-play, exemplifying AI’s ability to improve independently.

  • Constitution AI: An Anthropic approach where AI self-evaluates and refines responses (e.g., assessing helpfulness), illustrating early feedback loops in AI development.

  • Effective Compute: The combined impact of hardware, software, and investment on AI performance, growing rapidly due to synergistic advancements.

  • Fabs (Fabrication Plants): Facilities producing microchips (e.g., TSMC), critical for scaling compute; redirecting or expanding them could support massive AI training runs.

  • Doubling Time: The period required for a quantity (e.g., compute, robot population) to double, a key metric for the speed of an intelligence explosion (e.g., months for robots).

  • Human-Level AI: AI with capabilities equivalent to humans across all domains, far exceeding the threshold needed for an intelligence explosion due to AI-specific advantages (e.g., speed, scale).

  • Synthetic Training Data: Artificially generated data for AI training, allowing tailored skill development beyond human-collected datasets (e.g., AlphaZero’s self-play).

  • Reinforcement Learning: A training method where AI learns by receiving rewards or penalties, shaping behaviors like AlphaGo’s game mastery or potential misaligned goals.

  • Monte-Carlo Tree Search: A search algorithm used in AlphaGo, leveraging compute to explore game outcomes, enhancing AI decision-making beyond raw model capability.

  • Wireheading: A scenario where an AI (or human) optimizes for reward signals directly (e.g., hacking its system) rather than intended goals, posing an alignment challenge.

  • Takeover Scenario: A risk where AI seizes control from humans, pursuing its own objectives (e.g., maximizing reward), potentially leading to catastrophic outcomes.

  • Alignment: The process of ensuring AI goals match human values, preventing misbehavior like deception or takeover, a critical challenge in AI development.

  • Adversarial Training: A technique exposing AI to challenging scenarios (e.g., detecting deception) to improve robustness and alignment, though not guaranteed to generalize.

  • Interpretability: The ability to understand an AI’s internal decision-making (e.g., via weights, activations), vital for detecting misalignment, though currently limited.

  • Superintelligence: AI vastly exceeding human intelligence, potentially arising post-explosion, capable of reshaping the world physically and economically.

  • Nanotechnology: Advanced technology at the molecular scale (e.g., Drexler’s vision), potentially enabling rapid replication like biology, accelerating AI-driven production.

  • Physical Capital: Tangible assets (e.g., factories, robots) needed for production, contrasting with cognitive labor as AI automates mental tasks.

  • Scaling Laws: Relationships between resources (e.g., compute, data) and AI performance, observed in both AI (e.g., Chinchilla) and biology (e.g., brain size vs. learning time).

  • Primate Evolution: The biological process yielding human intelligence via larger brains and extended learning, offering insights into AI scaling without evolutionary constraints.

  • H100/A100: NVIDIA GPU models, with H100s being more advanced, illustrating ongoing hardware improvements fueling AI training.

  • Backdoor: A hidden vulnerability in software (e.g., code), potentially exploited by AI for takeover, a concern in alignment and security.

  • Preference Model: An AI component predicting human evaluations (e.g., GPT-4’s feedback system), used to refine outputs but vulnerable to misalignment.

These terms encapsulate the technical, economic, and philosophical dimensions of Shulman’s discussion, providing a framework for understanding AI’s trajectory and risks.

Transcript

Intelligence Explosion

0:50 Today I have the pleasure of speaking with Carl Shulman. Many of my former guests,
0:56 and this is not an exaggeration, have told me that a lot of their biggest ideas have come
1:03 directly from Carl especially when it has to do with the intelligence explosion and its impacts.
1:09 So I decided to go directly to the source and we have Carl today on the podcast. He keeps a super
1:14 low profile but is one of the most interesting intellectuals I've ever encountered and this is
1:20 actually his second podcast ever. We're going to go deep into the heart of many of the most
1:25 important ideas that are circulating right now directly from the source. Carl is also an advisor
1:31 to the Open Philanthropy project which is one of the biggest funders on causes having to do with AI and its risks, not to mention global health and well being. And he is a research
1:40 associate at the Future of Humanity Institute at Oxford. So Carl, it's a huge pleasure to
1:45 have you on the podcast. Thanks for coming. Thank you Dwarkesh. I've enjoyed seeing some of your episodes recently and I'm glad to be on the show. 1:53 Excellent, let's talk about AI. Before we get into the details, give me the big picture
1:59 explanation of the feedback loops and just general dynamics that would start when you have something
2:08 that is approaching human-level intelligence. The way to think about it is — we have a process
2:14 now where humans are developing new computer chips, new software, running larger training runs,
2:24 and it takes a lot of work to keep Moore's law chugging (while it was, it's slowing down now).
2:32 And it takes a lot of work to develop things like transformers, to develop a lot of the improvements
2:40 to AI neural networks. The core method that I want to highlight on this podcast, and which I think
2:51 is underappreciated, is the idea of input-output curves. We can look at the increasing difficulty
3:00 of improving chips and sure, each time you double the performance of computers
3:07 it’s harder and as we approach physical limits eventually it becomes impossible. But how much harder? There's a paper called “Are Ideas Getting Harder to Find?" that was
3:19 published a few years ago. 10 years ago at MIRI, I did an early version of this analysis using
3:31 data mainly from Intel and the large semiconductor fabricators. In this paper they cover a period
3:40 where the productivity of computing went up a million fold, so you could get a million times
3:47 the computing operations per second per dollar, a big change but it got harder. The amount of
3:55 investment and the labor force required to make those continuing advancements went up and up and
4:01 up. It went up 18 fold over that period. Some take this to say — “Oh, diminishing returns.
4:11 Things are just getting harder and harder and so that will be the end of progress eventually.”
4:16 However in a world where AI is doing the work, that doubling of computing performance,
4:26 translates pretty directly to a doubling or better of the effective labor supply. That is,
4:34 if when we had that million-fold compute increase we used it to run artificial intelligences who
4:42 would replace human scientists and engineers, then the 18x increase in the labor demands of
4:51 the industry would be trivial. We're getting more than one doubling of the effective labor
4:57 supply than we need for each doubling of the labor requirement and in that data set, it's over four.
5:06 So when we double compute we need somewhat more researchers but a lot less than twice as many.
5:15 We use up some of those doublings of compute on the increasing difficulty of further research,
5:22 but most of them are left to expedite the process. So if you double your labor force,
5:31 that's enough to get several doublings of compute. You use up one of them
5:37 on meeting the increased demands from diminishing returns. The others can be used to accelerate
5:44 the process so you have your first doubling take however many months, your next doubling can take
5:51 a smaller fraction of that, the next doubling less and so on. At least in so far as
6:00 the outputs you're generating, compute for AI in this story, are able to serve the function
6:06 of the necessary inputs. If there are other inputs that you need eventually those become a bottleneck
6:12 and you wind up more restricted on this. Got it. The bloom paper said there was a 35%
6:19 increase in transistor density and there was a 7% increase per year in the number
6:26 of researchers required to sustain that pace. Something in the vicinity, yeah. Four to five
6:32 doublings of compute per doubling of labor inputs. I guess there's a lot of questions you can delve
6:39 into in terms of whether you would expect a similar scale with AI and whether it makes sense
6:45 to think of AI as a population of researchers that keeps growing with compute itself. Actually,
6:52 let's go there. Can you explain the intuition that compute is a good proxy for the number of AI researchers so to speak? So far I've talked about hardware as an initial
7:02 example because we had good data about a past period. You can also make improvements on
7:08 the software side and when we think about an intelligence explosion that can include — AI
7:14 is doing work on making hardware better, making better software, making more hardware.
7:21 But the basic idea for the hardware is especially simple in that if you have an AI worker that can
7:30 substitute for a human, if you have twice as many computers you can run two separate instances of them and then they can do two different jobs, manage two different machines,
7:42 work on two different design problems. Now you can get more gains than just what you would get
7:50 by having two instances. We get improvements from using some of our compute not just to
7:56 run more instances of the existing AI, but to train larger AIs. There's hardware technology,
8:02 how much you can get per dollar you spend on hardware and there's software technology and
8:08 the software can be copied freely. So if you've got the software it doesn't necessarily make that much sense to say that — “Oh, we've got you a hundred Microsoft
8:16 Windows.” You can make as many copies as you need for whatever Microsoft will charge you.
8:24 But for hardware, it’s different. It matters how much we actually spend on the hardware at a given price. And if we look at the changes that have been driving AI recently,
8:36 that is the thing that is really off-trend. We are spending tremendously more money
8:43 on computer hardware for training big AI models. Okay so there's the investment in hardware,
8:52 there's the hardware technology itself, and there's the software progress itself. The AI is getting better because we're spending more money on it because our hardware itself
9:00 is getting better over time and because we're developing better models or better adjustments
9:05 to those models. Where is the loop here? The work involved in designing new hardware
9:11 and software is being done by people now. They use computer tools to assist them,
9:18 but computer time is not the primary cost for NVIDIA designing chips,
9:27 for TSMC producing them, or for ASML making lithography equipment to serve the TSMC fabs.
9:35 And even in AI software research that has become quite compute intensive we're still in the range
9:44 where at a place like DeepMind salaries were still larger than compute for the experiments.
9:52 Although more recently tremendously more of the expenditures were on compute relative to salaries.
10:00 If you take all the work that's being done by those humans, there's like low tens of
10:07 thousands of people working at Nvidia designing GPUs specialized for AI. There's more than 70,000
10:14 people at TSMC which is the leading producer of cutting-edge chips. There's a lot of additional
10:23 people at companies like ASML that supply them with the tools they need and then a company like
10:30 DeepMind, I think from their public filings, they recently had a thousand people. OpenAI is a few
10:38 hundred people. Anthropic is less. If you add up things like Facebook AI research, Google Brain,
10:45 other R&D, you get thousands or tens of thousands of people who are working on AI research. 10:53 We would want to zoom in on those who are developing new methods rather than narrow applications. So inventing the transformer definitely counts but optimizing for some
11:03 particular businesses data set cleaning probably not. So those people are doing this work,
11:09 they're driving quite a lot of progress. What we observe in the growth of people relative to
11:16 the growth of those capabilities is that pretty consistently the capabilities are doubling on a
11:24 shorter time scale than the people required to do them are doubling. We talked about hardware
11:32 and how it was pretty dramatic historically. Like four or five doublings of compute efficiency per
11:40 doubling of human inputs. I think that's a bit lower now as we get towards the end of Moore's
11:45 law although interestingly not as much lower as you might think because the growth of inputs has also slowed recently. On the software side there's some work by Tamay Besiroglu and collaborators;
12:01 it may have been his thesis. It's called Are models getting harder to find? and it's applying
12:08 the same analysis as the “Are ideas getting harder to find?” and you can look at growth rates of
12:16 papers, from citations, employment at these companies, and it seems like the doubling time of these like workers driving the software advances is like several
12:30 years whereas the doubling of effective compute from algorithmic progress is faster.
12:36 There's a group called Epoch, they've received grants from open philanthropy, and they do work collecting datasets that are relevant to forecasting AI progress.
12:49 Their headline results for what's the rate of progress in hardware and software,
12:56 and growth in budgets are as follows — For hardware, they're looking at a doubling of
13:04 hardware efficiency in like two years. It's possible it’s a bit better than that when you take into account certain specializations for AI workloads. For the growth of budgets
13:13 they find a doubling time that's something like six months in recent years which is pretty
13:20 tremendous relative to the historical rates. We should maybe get into that later and then on the
13:27 algorithmic progress side, mainly using Imagenet type datasets right now they find a doubling time
13:33 that's less than one year. So when you combine all of these things the growth of effective compute
13:41 for training big AIs is pretty drastic. I think I saw an estimate that GPT-4 cost
13:49 like 50 million dollars or around that range to train. Now suppose that AGI takes a 1000x that,
13:56 if you were just a scale of GPT-4 it might not be that but just for the sake of example, some part
14:01 of that will come from companies just spending a lot more to train the models and that’s just greater investment. Part of that will come from them having better models.You get the same effect
14:19 of increasing it by 10x just from having a better model. You can spend more money on it to train a
14:26 bigger model, you can just have a better model, or you can have chips that are cheaper to train
14:31 so you get more compute for the same dollars. So those are the three you are describing the ways
14:37 in which the “effective compute” would increase? Looking at it right now, it looks like you might
14:43 get two or three doublings of effective compute for this thing that we're calling software progress which people get by asking — how much less compute can you use now to achieve the
14:56 same benchmark as you achieved before? There are reasons to not fully identify this with software
15:01 progress as you might naively think because some of it can be enabled by the other. When you have
15:08 a lot of compute you can do more experiments and find algorithms that work better. We were
15:13 talking earlier about how sometimes with the additional compute you can get higher efficiency
15:18 by running a bigger model. So that means you're getting more for each GPU that you have because
15:27 you made this larger expenditure. That can look like a software improvement because this model
15:35 is not a hardware improvement directly because it's doing more with the same hardware but you
15:40 wouldn't have been able to achieve it without having a ton of GPUs to do the big training run. The feedback loop itself involves the AI that is the result of this greater effect of compute
15:50 helping you train better AI or use less effective compute in the future to train better AI? 15:57 It can help with the hardware design. NVIDIA is a fab-less chip design company. They
16:03 don't make their own chips. They send files of instructions to TSMC which then fabricates the
16:10 chips in their own facilities. If you could automate the work of those 10,000+ people
16:21 and have the equivalent of a million people doing that work then you would pretty quickly
16:29 get the kind of improvements that can be achieved with the existing nodes that
16:34 TSMC is operating on and get a lot of those chip design gains. Basically doing the job
16:41 of improving chip design that those people are working on now but get it done faster. While that's one thing I think that's less important for the intelligence explosion.
16:51 The reason being that when you make an improvement to chip design it only
16:56 applies to the chips you make after that. If you make an improvement in AI software,
17:02 it has the potential to be immediately applied to all of the GPUs that you already have.
17:09 So the thing that I think is most disruptive and most important and has the leading edge of
17:15 the change from AI automation of the inputs to AI is on the software side Can AIs do AI research? 17:21 At what point would it get to the point where the AIs are helping develop better software or better models for future AIs? Some people claim today, for example,
17:30 that programmers at OpenAI are using Copilot to write programs now. So in some sense you're
17:37 already having that feedback loop but I'm a little skeptical of that as a mechanism. At what point
17:43 would it be the case that the AI is contributing significantly in the sense that it would almost
17:49 be the equivalent of having additional researchers to AI progress and software? The quantitative magnitude of the help is absolutely central. There are plenty of companies
17:58 that make some product that very slightly boosts productivity. When Xerox makes fax machines,
18:06 it maybe increases people's productivity in office work by 0.1% or something. You're not
18:13 gonna have explosive growth out of that because 0.1% more effective R&D at Xerox
18:22 and any customers buying the machines is not that important. The thing to look for
18:31 is — when is it the case that the contributions from AI are starting to become as large as the
18:41 contributions from humans? So when this is boosting their effective productivity by 50
18:48 or 100% and if you then go from like eight months doubling time for effective compute from software
18:57 innovations, things like inventing the transformer or discovering chinchilla scaling and doing your
19:03 training runs more optimally or creating flash attention. If you move that from 8 months to 4
19:09 months and then the next time you apply that it significantly increases the boost you're getting
19:16 from the AI. Now maybe instead of giving a 50% or 100% productivity boost now it's more like 200%.
19:23 It doesn't have to have been able to automate everything involved in the process of AI
19:29 research. It can be that it's automated a bunch of things and then those are being done in extreme
19:35 profusion. A thing AI can do, you can have it done much more often because it's so cheap.
19:43 And so it's not a threshold of — this is human level AI, it can do everything
19:49 a human can do with no weaknesses in any area. It's that, even with its weaknesses
19:55 it's able to bump up the performance. So that instead of getting the results we would have
20:01 with the 10,000 people working on finding these innovations, we get the results that we would have if we had twice as many of those people with the same kind of skill distribution. 20:13 It’s a demanding challenge, you need quite a lot of capability for that but it's also important
20:20 that it's significantly less than — this is a system where there's no way you can point at it
20:25 and say in any respect it is weaker than a human. A system that was just as good as a human in every
20:32 respect but also had all of the advantages of an AI, that is just way beyond this point. If you
20:39 consider that the output of our existing fabs make tens of millions of advanced GPUs per year. Those
20:49 GPUs if they were running AI software that was as efficient as humans, it is sample efficient,
20:55 it doesn't have any major weaknesses, so they can work four times as long,
21:02 the 168 hour work week, they can have much more education than any human. A human, you got a PhD,
21:14 it's like 20 years of education, maybe longer if they take a slow route on the PhD. It's
21:22 just normal for us to train large models by eat the internet, eat all the published books ever,
21:30 read everything on GitHub and get good at predicting it. So the level of education vastly beyond any human, the degree to which the models are focused on task
21:43 is higher than all but like the most motivated humans when they're really, really gunning for it.
21:50 So you combine the things tens of millions of GPUs, each GPU is doing the work of the very best
21:58 humans in the world and the most capable humans in the world can command salaries that are a lot
22:05 higher than the average and particularly in a field like STEM or narrowly AI,
22:11 like there's no human in the world who has a thousand years of experience with TensorFlow or
22:17 let alone the new AI technology that was invented the year before but if they were around, yeah,
22:24 they'd be paid millions of dollars a year. And so when you consider this — tens of millions of GPUs.
22:32 Each is doing the work of 40, maybe more of these existing workers, is like going from a workforce
22:42 of tens of thousands to hundreds of millions. You immediately make all kinds of discoveries, then
22:50 you immediately develop all sorts of tremendous technologies. Human level AI is deep, deep into
22:58 an intelligence explosion. Intelligence explosion has to start with something weaker than that. 23:03 Yeah, what is the thing it starts with and how close are we to that? Because
23:11 to be a researcher at OpenAI is not just completing the hello world Prompt that Copilot
23:17 does right? You have to choose a new idea, you have to figure out the right way to approach it, you perhaps have to manage the people who are also working with you on that problem.
23:26 It's an incredibly complicated portfolio of skills rather than just a single skill. What is the point
23:33 at which that feedback loop starts where you're not just doing the 0.5% increase in productivity
23:39 that an AI tool might do but is actually the equivalent of a researcher or close to it? 23:48 Maybe a way is to give some illustrative examples of the kinds of capabilities that you might see.
23:55 Because these systems have to be a lot weaker than the human-level things, what we'll have is
24:02 intense application of the ways in which AIs have advantages partly offsetting their weaknesses.
24:10 AIs are cheap so we can call a lot of them to do many small problems. You'll have situations
24:18 where you have dumber AIs that are deployed thousands of times to equal one human worker.
24:29 And they'll be doing things like voting algorithms where with an LLM you generate a bunch of
24:38 different responses and take a majority vote among them that improves some performance. You'll have
24:44 things like the AlphaGo kind of approach where you use the neural net to do search and you go deeper
24:53 with the search by plowing in more compute which helps to offset the inefficiency and weaknesses of
25:00 the model on its own. You'll do things that would just be totally impractical for humans because of
25:07 the sheer number of steps, an example of that would be designing synthetic training data.
25:13 Humans do not learn by just going into the library and opening books at random pages,
25:19 it's actually much much more efficient to have things like schools and classes where they teach
25:26 you things in an order that makes sense, focusing on the skills that are more valuable to learn.
25:32 They give you tests and exams. They're designed to try and elicit the skill they're actually trying
25:37 to teach. And right now we don't bother with that because we can hoover up more data from
25:44 the internet. We're getting towards the end of that but yeah, as the AIs get more sophisticated they'll be better able to tell what is a useful kind of skill to practice and to generate that.
25:57 We've done that in other areas like AlphaGo. The original version of AlphaGo was booted up with
26:03 data from human Go play and then improved with reinforcement learning and Monte-carlo tree search
26:11 but then AlphaZero, a somewhat more sophisticated model benefited from some other improvements
26:20 but was able to go from scratch and it generated its own data through self play.
26:28 Getting data of a higher quality than the human data because there are no human players that good available in the data set and also a curriculum so that at any given point
26:40 it was playing games against an opponent of equal skill itself.
26:45 It was always in an area when it was easy to learn. If you're just always losing no matter what you do, or always winning no matter what you do, it's hard to distinguish which things
26:55 are better and which are worse? And when we have somewhat more sophisticated AIs that can generate
27:02 training data and tasks for themselves, for example if the AI can generate a lot of unit tests
27:08 and then can try and produce programs that pass those unit tests, then the interpreter
27:14 is providing a training signal and the AI can get good at figuring out what's the kind of
27:20 programming problem that is hard for AIs right now that will develop more of the skills that I need
27:27 and then do them. You're not going to have employees at Open AI write a billion programming
27:34 problems, that's just not gonna happen. But you are going to have AIs given the task of producing
27:40 the enormous number of programming challenges. In LLMs themselves, there's a paper out of
27:46 Anthropic called Constitution AI where they basically had the program just talk to itself
27:51 and say, "Is this response helpful? If not, how can I make this more helpful” and the responses
27:57 improved and then you train the model on the more helpful responses that it generates by talking to itself so that it generates it natively and you could imagine more sophisticated or better
28:08 ways to do that. But then the question is GPT-4 already costs like 50 million or 100 million or
28:14 whatever it was. Even if we have greater effective compute from hardware increases and better models,
28:20 it's hard to imagine how we could sustain four or five orders of magnitude greater effective size
28:27 than GPT-4 unless we're dumping in trillions of dollars, the entire economies of big countries,
28:34 into training the next version. The question is do we get something that
28:40 can significantly help with AI progress before we run out of the sheer money and
28:48 scale and compute that would require to train it? Do you have a take on that? First I'd say remember that there are these three contributing trends. The new H100s are
28:58 significantly better than the A100s and a lot of companies are actually just waiting for their deliveries of H100s to do even bigger training runs along with the work
29:10 of hooking them up into clusters and engineering the thing. All of those factors are contributing
29:16 and of course mathematically yeah, if you do four orders of magnitude more than 50 or 100
29:24 million then you're getting to trillion dollar territory. I think the way to look at it is
29:32 at each step along the way, does it look like it makes sense to do the next step?
29:39 From where we are right now seeing the results with GPT-4 and ChatGPT companies like Google and
29:47 Microsoft are pretty convinced that this is very valuable. You have talk at Google and Microsoft
29:58 that it's a billion dollar matter to change market share in search by a percentage point
30:07 so that can fund a lot. On the far end if you automate human labor we have a hundred trillion
30:17 dollar economy and most of that economy is paid out in wages, between 50 and 70 trillion dollars
30:25 per year. If you create AGI it's going to automate all of that and keep increasing beyond that.
30:34 So the value of the completed project Is very much worth throwing our whole economy into it,
30:43 if you're going to get the good version and not the catastrophic destruction of the human race
30:49 or some other disastrous outcome. In between it's a question of — how risky and uncertain
31:00 is the next step and how much is the growth in revenue you can generate with it? For moving up to
31:08 a billion dollars I think that's absolutely going to happen. These large tech companies have R&D
31:13 budgets of tens of billions of dollars and when you think about it in the relevant sense all the
31:20 employees at Microsoft who are doing software engineering that’s contributing to creating software objects, it's not weird to spend tens of billions of dollars on a product that would do so
31:33 much. And I think that it's becoming clearer that there is a market opportunity to fund the thing.
31:41 Going up to a hundred billion dollars, that's the existing R&D budgets spread over multiple years.
31:48 But if you keep seeing that when you scale up the model it substantially improves the performance,
31:54 it opens up new applications, that is you're not just improving your search but maybe it
31:59 makes self-driving cars work, you replace bulk software engineering jobs or if not replace them
32:06 amplify productivity. In this kind of dynamic you actually probably want to employ all the software engineers you can get as long as they are able to make any contribution because the returns
32:16 of improving stuff in AI itself gets so high. But yeah, I think that can go up to a hundred billion.
32:24 And at a hundred billion you're using a significant fraction of our existing fab
32:31 capacity. Right now the revenue of NVIDIA is 25 billion, the revenue of TSMC is over 50 billion. I
32:44 checked in 2021, NVIDIA was maybe 7.5%, less than 10% of TSMC revenue. So there's a lot of room and
32:57 most of that was not AI chips. They have a large gaming segment, there are data center GPU's that
33:03 are used for video and the like. There's room for more than an order of magnitude increase by
33:13 redirecting existing fabs to produce more AI chips and they're just actually using the AI chips that
33:20 these companies have in their cloud for the big training runs. I think that that's enough to go
33:26 to the 10 billion and then combine with stuff like the H100 to go up to the hundred billion. Just to emphasize for the audience the initial point about revenue made. If it costs OpenAI
33:35 100 million dollars to train GPT-4 and it generates 500 million dollars in revenue,
33:40 you pay back your expenses with 100 million and you have 400 million for your next training run. Then you train your GPT 4.5, you get let's say four billion dollars in revenue out of that.
33:51 That's where the feedback group of revenue comes from. Where you're automating tasks and therefore you're making money you can use that money to automate more tasks. On the ability to redirect
34:03 the fab production towards AI chips, fabs take a decade or so to build. Given the ones we have
34:17 now and the ones that are going to come online in the next decade, is there enough to sustain a hundred billion dollars of GPU compute if you wanted to spend that on a training run? 34:26 Yes, you definitely make the hundred billion one. As you go up to a trillion dollar run and larger,
34:33 it's going to involve more fab construction and yeah, fabs can take a long a long time to build.
34:39 On the other hand, if in fact you're getting very high revenue from the AI systems and you're
34:49 actually bottlenecked on the construction of these fabs then their price could skyrocket and that
34:56 could lead to measures we've never seen before to expand and accelerate fab production. If you
35:04 consider, at the limit you're getting models that approach human-like capability, imagine things
35:11 that are getting close to brain-like efficiencies plus AI advantages. We were talking before
35:22 a cluster of GPU supporting AIs that do things, data parallelism. If that can work four times as
35:31 much as a highly skilled motivated focused human with levels of education that have never been
35:37 seen in the human population, and if a typical software engineer can earn hundreds of thousands
35:44 of dollars, the world's best software engineers can earn millions of dollars today and maybe more
35:50 in a world where there's so much demand for AI. And then times four for working all the time.
36:00 If you can generate close to 10 million dollars a year out of the future version H100 and it cost
36:08 tens of thousands of dollars with a huge profit margin now. And profit margin could be reduced
36:15 with large production. That is a big difference that that chip pays for itself almost instantly
36:25 and you could support paying 10 times as much to have these fabs constructed more rapidly.
36:34 If AI is starting to be able to contribute more of the skilled technical work that makes it hard
36:42 for NVIDIA to suddenly find thousands upon thousands of top quality engineering hires. 36:51 If AI hasn't reached that level of performance then this is how you can have things stall
36:57 out. A world where AI progress stalls out is one where you go to the 100 billion and then
37:03 over succeeding years software progress turns out to stall. You lose the gains that you are getting
37:14 from moving researchers from other fields. Lots of physicists and people from other areas of computer science have been going to AI but you tap out those resources as AI becomes a larger proportion
37:26 of the research field. And okay, you've put in all of these inputs, but they just haven't yielded
37:32 AGI yet. I think that set of inputs probably would yield the kind of AI capabilities needed
37:39 for intelligence explosion but if it doesn't, after we've exhausted this current scale up of
37:45 increasing the share of our economy that is trying to make AI. If that's not enough then after that
37:51 you have to wait for the slow grind of things like general economic growth, population growth
37:57 and such and so things slow. That results in my credences and this kind of advanced AI happening
38:03 to be relatively concentrated, over the next 10 years compared to the rest of the century because
38:10 we can't keep going with this rapid redirection of resources into AI. That's a one-time thing. Primate evolution 38:17 If the current scale up works we're going to get to AGI really fast, like within the next
38:23 10 years or something. If the current scale up doesn't work, all we're left with is just
38:28 like the economy growing 2% a year, we have 2% a year more resources to spend on AI and at that
38:35 scale you're talking about decades before just through sheer brute force you can train the 10
38:41 trillion dollar model or something. Let's talk about why you have your thesis that the current
38:46 scale up would work. What is the evidence from AI itself or maybe from primate evolution and the
38:52 evolution of other animals? Just give me the whole confluence of reasons that make you think that. Maybe the best way to look at that might be to consider, when I first became interested
39:02 in this area, so in the 2000s which was before the deep learning revolution,
39:07 how would I think about timelines? How did I think about timelines? And then how have I updated based on what has been happening with deep learning? Back then I would have said
39:21 we know the brain is a physical object, an information processing device, it works,
39:28 it's possible and not only is it possible it was created by evolution on earth.
39:35 That gives us something of an upper bound in that this kind of brute force
39:40 was sufficient. There are some complexities like what if it was a freak accident and that
39:47 didn't happen on all of the other planets and that added some value. I have a paper with Nick Bostrom on this. I think basically that's not that important an issue. There's convergent evolution,
39:59 octopi are also quite sophisticated. If a special event was at the level of forming cells at all,
40:08 or forming brains at all, we get to skip that because we're choosing to build computers and
40:14 we already exist. We have that advantage. So evolution gives something of an upper bound,
40:20 really intensive massive brute force search and things like
40:25 evolutionary algorithms can produce intelligence. Isn’t the fact that octopi and other mammals got
40:32 to the point of being pretty intelligent but not human level intelligent some evidence that there's a hard step between a cephalopod and a human? Yeah, that would be a place to look
40:45 but it doesn't seem particularly compelling. One source of evidence on that is work by
40:52 Herculano-Houzel. She's a neuroscientist who has dissolved the brains of many creatures and
41:03 by counting the nuclei she's able to determine how many neurons are present in different species
41:11 and has found a lot of interesting trends in scaling laws. She has a paper discussing the
41:18 human brain as a scaled up primate brain. Across a wide variety of animals, mammals in particular,
41:27 there's certain characteristic changes in the number of neurons and the size of different
41:33 brain regions as things scale up. There's a lot of structural similarity there and you can explain
41:45 a lot of what is different about us with a brute force story which is that you expend resources
41:54 on having a bigger brain, keeping it in good order, and giving it time to learn. We have an
42:00 unusually long childhood. We spend more compute by having a larger brain than other animals,
42:08 more than three times as large as chimpanzees, and then we have a longer childhood than chimpanzees
42:14 and much more than many, many other creatures. So we're spending more compute in a way that's analogous to having a bigger model and having more training time with it. And given that we see
42:26 with our AI models, these large consistent benefits from increasing compute spent in
42:34 those ways and with qualitatively new capabilities showing up over and over again particularly in
42:40 areas that AI skeptics call out. In my experience over the last 15 years the things that people call
42:49 out are like —”Ah, but the AI can't do that and it's because of a fundamental limitation.” We've gone through a lot of them. There were Winograd schemas, catastrophic forgetting, quite a number
43:01 and they have repeatedly gone away through scaling. So there's a picture that we're
43:11 seeing supported from biology and from our experience with AI where you can explain —
43:18 Yeah, in general, there are trade-offs where the extra fitness you get from a brain is not worth it
43:25 and so creatures wind up mostly with small brains because they can save that biological energy and
43:32 that time to reproduce, for digestion and so on. Humans seem to have wound up in a
43:42 self-reinforcing niche where we greatly increase the returns to having large brains. Language and
43:50 technology are the obvious candidates. You have humans around you who know a lot of things and
43:57 they can teach you. And compared to almost any other species we have vastly more instruction from parents and the society of the [unclear]. You're getting way more from your brain than
44:09 you get per minute because you can learn a lot more useful skills and then you can
44:14 provide the energy you need to feed that brain by hunting and gathering, by having fire that makes digestion easier. Basically how this process goes on is that it's
44:24 increasing the marginal increase in reproductive fitness you get from allocating more resources
44:31 along a bunch of dimensions towards cognitive ability. That's bigger brains, longer childhood,
44:38 having our attention be more on learning. Humans play a lot and we keep playing as
44:45 adults which is a very weird thing compared to other animals. We're more motivated to copy
44:51 other humans around us than the other primates. These are motivational changes that keep us using
44:59 more of our attention and effort on learning which pays off more when you have a bigger brain and a longer lifespan in which to learn in. Many creatures are subject to lots of predation
45:09 or disease. If you're mayfly or a mouse and if you try and invest in a giant brain and a
45:17 very long childhood you're quite likely to be killed by some predator or some disease before
45:24 you're actually able to use it. That means you actually have exponentially increasing costs in a given niche. If I have a 50% chance of dying every few months, as a little mammal or a little lizard,
45:37 that means the cost of going from three months to 30 months of learning and childhood
45:43 development is not 10 times the loss, it’s 2^-10. A factor of 1024 reduction in the benefit I get
45:55 from what I ultimately learn because 99.9 percent of the animals will have been killed before
46:01 that point. We're in a niche where we're a large long-lived animal with language and technology so
46:08 where we can learn a lot from our groups. And that means it pays off to just expand our investment
46:15 on these multiple fronts in intelligence. That's so interesting. Just for the audience
46:23 the calculation about like two to the whatever months is just like, you have a half chance of dying this month, a half chance of dying next month, you multiply those together. There's
46:31 other species though that do live in flocks or as packs. They do have a smaller version of the
46:40 development of cubs that play with each other. Why isn't this a hill on which they could have
46:48 climbed to human level intelligence themselves? If it's something like language or technology,
46:55 humans were getting smarter before we got language. It seems like there should be
47:06 other species that should have beginnings of this cognitive revolution especially given how valuable
47:11 it is given we've dominated the world. You would think there would be selective pressure for it. Evolution doesn't have foresight. The thing in this generation that gets more surviving
47:23 offspring and grandchildren is the thing that becomes more common. Evolution doesn't look ahead
47:28 and think oh in a million years you'll have a lot of descendants. It's what survives and reproduces
47:35 now. In fact, there are correlations where social animals do on average have larger brains and
47:46 part of that is probably the additional social applications of brains, like keeping track of
47:52 which of your group members have helped you before so that you can reciprocate. You scratch my back,
47:58 I'll scratch yours. Remembering who's dangerous within the group is an additional application of
48:05 intelligence. So there's some correlation there but what it seems like is that
48:12 in most of these cases it's enough to invest more but not invest to the point where a mind
48:21 can easily develop language and technology and pass it on. You see bits of tool use in
48:28 some other primates who have an advantage compared to say whales who have quite large brains partly
48:34 because they are so large themselves and they have some other things, but they don't have hands which
48:40 means that reduces a bunch of ways in which brains can pay off and investments in the functioning of
48:46 that brain. But yeah, primates will use sticks to extract termites, Capuchin monkeys will open clams
48:55 by smashing them with a rock. But what they don't have is the ability to sustain culture.
49:04 A particular primate will maybe discover one of these tactics and it'll be copied
49:09 by their immediate group but they're not holding on to it that well. When they see
49:15 the other animal do it they can copy it in that situation but they don't actively teach each other in their population. So it's easy to forget things, easy to lose information
49:27 and in fact they remain technologically stagnant for hundreds of thousands of years. 49:33 And we can look at some human situations. There's an old paper, I believe by the economist Michael
49:42 Kramer, which talks about technological growth in the different continents for human societies.
49:52 Eurasia is the largest integrated connected area. Africa is partly connected to it but the Sahara
49:58 desert restricts the flow of information and technology and such. Then you have the Americas
50:05 after the colonization from the land bridge were largely separated and are smaller than Eurasia,
50:11 then Australia, and then you had smaller island situations like Tasmania. Technological progress
50:18 seems to have been faster the larger the connected group of people. And in the smallest groups,
50:25 like Tasmania where you had a relatively small population, they actually lost technology.
50:30 They lost some fishing techniques. And if you have a small population and you have
50:37 some limited number of people who know a skill and they happen to die or there's
50:44 some change in circumstances that causes people not to practice or pass on that thing
50:50 then you lose it. If you have few people you're doing less innovation and the rate at which you
50:55 lose technologies to some local disturbance and the rate at which you create new technologies
51:01 can wind up imbalanced. The great change of hominids and humanity is that we wound up in this
51:10 situation where we were accumulating faster than we were losing and accumulating those technologies
51:15 allowed us to expand our population. They created additional demand for intelligence so our brains
51:22 became three times as large as chimpanzees and our ancestors who had a similar brain size. 51:29 Okay. And the crucial point in relevance to AI is that the selective pressures against
51:36 intelligence in other animals are not acting against these neural networks because they're
51:44 not going to get eaten by a predator if they spend too much time becoming more intelligent, we're explicitly training them to become more intelligent. So we have good first principles
51:53 reason to think that if it was scaling that made our minds this powerful and if the things
51:58 that prevented other animals from scaling are not impinging on these neural networks, these things
52:04 should just continue to become very smart. Yeah, we are growing them in a technological culture where there are jobs like software engineer that depend much more on cognitive
52:15 output and less on things like metabolic resources devoted to the immune system or
52:22 to building big muscles to throw spears. This is kind of a side note but I'm just
52:27 kind of interested. You referenced Chinchilla scaling at some point. For the audience this is a paper from DeepMind which describes if you have a model of a certain size what is
52:35 the optimum amount of data that it should be trained on? So you can imagine bigger models, you can use more data to train them and in this way you can figure out where you should spend your
52:45 compute. Should you spend it on making the model bigger or should you spend it on training it for longer? In the case of different animals, in some sense how big their brain is like model sizes and
52:56 they're training data sizes like how long they're cubs or how long their infants or toddlers
53:01 before they’re full adults. I’m curious, is there some kind of scaling law? Chinchilla scaling is interesting because we were talking earlier about the cost function for having
53:11 a longer childhood where it's exponentially increasing in the amount of training compute
53:17 you have when you have exogenous forces that can kill you. Whereas when we do big training runs,
53:23 the cost of throwing in more GPU is almost linear and it's much better to be linear than
53:29 exponentially decay as you expend resources. Oh, that's a really good point. Chinchilla scaling would suggest that for a brain of human size it would be optimal
53:41 to have many millions of years of education but obviously that's impractical because of
53:47 exogenous mortality for humans. So there's a fairly compelling argument that relative
53:54 to the situation where we would train AI that animals are systematically way under trained.
54:02 They're more efficient than our models. We still have room to improve our algorithms to
54:08 catch up with the efficiency of brains but they are laboring under that disadvantage. 54:16 That is so interesting. I guess another question you could have is: Humans got started on this evolutionary hill climbing route where we're getting
54:25 more intelligent because it has more benefits for us. Why didn't we go all the way on that
54:31 route? If intelligence is so powerful why aren't all humans as smart as we know humans can be?
54:40 If intelligence is so powerful, why hasn't there been stronger selective pressure? I understand hip
54:46 size, you can't give birth to a really big headed baby or whatever. But you would think evolution would figure out some way to offset that if intelligence has such big power and is so useful. 54:56 Yeah, if you actually look at it quantitatively that's not true and even in recent history it
55:04 looks like a pretty close balance between the costs and the benefits of having more
55:11 cognitive abilities. You say, who needs to worry about the metabolic costs? Humans put
55:22 20 percent of our metabolic energy into the brain and it's higher for young children.
55:32 And then there's like breathing and digestion and the immune system. For
55:38 most of history people have been dying left and right. A very large proportion of people will die of infectious disease and if you put more resources into your immune system
55:50 you survive. It's life or death pretty directly via that mechanism. People die
56:01 more of disease during famine and so there's boom or bust. If you have 20% less metabolic
56:07 requirements [unclear] you're much more likely to survive that famine. So these are pretty big. 56:20 And then there's a trade-off about just cleaning mutational load. So every generation new mutations
56:26 and errors happen in the process of reproduction. We know there are many genetic abnormalities that
56:34 occur through new mutations each generation and in fact Down syndrome is the chromosomal abnormality
56:42 that you can survive. All the others just kill the embryo so we never see them. But down syndrome
56:51 occurs a lot and there are many other lethal mutations and there are enormous numbers of less
56:59 damaging mutations that are degrading every system in the body. Evolution each generation has to
57:08 pull away at some of this mutational load and the priority with which that mutational load is pulled
57:14 out scales in proportion to how much the traits it is affecting impact fitness. So you get new
57:21 mutations that impact your resistance to malaria, you got new mutations that damage brain function
57:29 and then those mutations are purged each generation. If malaria is a bigger difference
57:36 in mortality than the incremental effectiveness as a hunter-gatherer you get from being slightly more
57:42 intelligent, then you'll purge that mutational load first. Similarly humans have been vigorously
57:51 adapting to new circumstances. Since agriculture people have been developing things like the
57:56 ability to have amylase to digest breads and milk. If you're evolving for all of these things and if
58:09 some of the things that give an advantage for that incidentally carry along nearby them some negative
58:14 effect on another trait then that other trait can be damaged. So it really matters how important
58:21 to survival and reproduction cognitive abilities were compared to everything else the organism
58:26 has to do. In particular, surviving famine, having the physical abilities to do hunting and gathering
58:35 and even if you're very good at planning your hunting, being able to throw a spear harder can
58:42 be a big difference and that needs energy to build those muscles and then to sustain them. 58:48 Given all these factors it's not a slam dunk to invest at the margin. And today,
58:58 having bigger brains is associated with greater cognitive ability but it's modest.
59:07 Large-scale pre-registered studies with MRI data. The correlation is in a range of 0.25 - 0.3
59:18 and the standard deviation of brain size is like 10%. So if you
59:24 double the size of the brain, the existing brain costs like 20 of metabolic energy go up to 40%,
59:31 okay, that's like eight standard deviations of brain size if the correlation is
59:38 0.25 then yeah, you get a gain from that eight standard deviations of brain size, two standard
59:48 deviations of cognitive ability. In our modern society, where cognitive ability is very rewarded
59:55 and finishing school and becoming an engineer or a doctor or whatever can pay off a lot financially,
1:00:06 the average observed return in income is still only one or two percent proportional
1:00:13 increase. There's more effects at the tail, there's more effect in professions like STEM
1:00:18 but on the whole it's not a lot. If it was like a five percent increase or a 10 percent increase
1:00:25 then you could tell a story where yeah, this is hugely increasing the amount of food you could have, you could support more children, but it's a modest effect and the metabolic costs
1:00:35 will be large and then throw in these other these other aspects. Else we can just see there was not
1:00:44 very strong rapid directional selection on the thing which would be there if
1:00:51 by solving a math puzzle you could defeat malaria, then there would be more evolutionary pressure. 1:01:00 That is so interesting. Not to mention of course that if you had 2x the brain size,
1:01:05 without c-section you or your mother or both would die. This is a question I've actually been curious
1:01:10 about for over a year and I’ve briefly tried to look up an answer. I know this was off topic and
1:01:17 my apologies to the audience, but I was super interested and that was the most comprehensive and interesting answer I could have hoped for. So yeah, we have a good explanation or good first
1:01:26 principles evolution or reason for thinking that intelligence scaling up to humans is not
1:01:33 implausible just by throwing more scale at it. I would also add that we also have the brain right here with us available for neuroscience to reverse engineer its properties. This was
1:01:38 something that would have mattered to me more in the 2000s. Back then when I said, yeah, I expect
1:01:51 this by the middle of the century-ish, that was a backstop if we found it absurdly difficult to get
1:01:57 to the algorithms and then we would learn from neuroscience. But in actual history,
1:02:03 it's really not like that. We develop things in AI and then also we can say oh, yeah, this is
1:02:09 like this thing in neuroscience or maybe this is a good explanation. It's not as though neuroscience
1:02:14 Is driving AI progress. It turns out not to be that necessary. I guess that is similar to how planes were inspired by the existence proof of birds
1:02:24 but jet engines don't flap. All right, good reason to think scaling might work.
1:02:31 So we spent a hundred billion dollars and we have something that is like human level or can help significantly with AI research. I mean that that might be on the earlier end but I
1:02:42 definitely would not rule that out given the rates of change we've seen with the last few scale ups. Forecasting AI progress 1:02:50 At this point somebody might be skeptical. We already have a bunch of human researchers,
1:02:55 how profitable is the incremental researcher? And then you might say no, this is thousands of researchers. I don’t know how to express this skepticism exactly. But skeptical
1:03:05 of just generally the effect of scaling up the number of people working on the problem to rapid-rapid progress on that problem. Somebody might think that with humans the reason the amount
1:03:16 of population working on a problem is such a good proxy for progress on the problem is that there's already so much variation that is accounted for. When you say there's a million people
1:03:24 working on a problem, there's hundreds of super geniuses working on it, thousands of people who
1:03:29 are very smart working on it. Whereas with an AI all the copies are the same level of intelligence
1:03:35 and if it's not super genius intelligence the total quantity might not matter as much. 1:03:44 I'm not sure what your model is here. Is the model that the diminishing returns kickoff, suddenly has
1:03:55 a cliff right where we are? There were results in the past from throwing more people at problems
1:04:04 and this has been useful in historical prediction, this idea of experience curves and [unclear] law
1:04:14 measuring cumulative production in a field, which is also going to be a measure of the
1:04:21 scale of effort and investment, and people have used this correctly to argue that renewable
1:04:27 energy technology, like solar, would be falling rapidly in price because it was going from a low
1:04:33 base of very small production runs, not much investment in doing it efficiently,
1:04:40 and climate advocates correctly called out, people like David Roberts, the futurist
1:04:49 [unclear] actually has some interesting writing on this. They correctly called out that there would
1:04:56 be a really drastic fall in prices of solar and batteries because of the increasing investment
1:05:02 going into that. The human genome project would be another. So I’d say there's real evidence. These
1:05:09 observed correlations, from ideas getting harder to find, have held over a fair range of data and
1:05:17 over quite a lot of time. So I'm wondering what‘s the nature of the deviation you're thinking of? 1:05:29 Maybe this is a good way to describe what happens when more humans enter a field but does it even
1:05:35 make sense to say that a greater population of AIs is doing AI research if there's like more GPUs running a copy of GPT-6 doing AI research. How applicable are these economic models of the
1:05:48 quantity of humans working on a problem to the magnitude of AIs working on a problem? 1:05:54 If you have AIs that are directly automating particular jobs that humans were doing before
1:06:01 then we say, well with additional compute we can run more copies of them to do more of those
1:06:07 tasks simultaneously. We can also run them at greater speed. Some people have an intuition
1:06:13 that what matters is time, that it's not how many people working on a problem at a given point. I
1:06:20 think that doesn't bear out super well but AI can also run faster than humans. If you have a set of
1:06:29 AIs that can do the work of the individual human researchers and run at 10 times or 100 times the
1:06:38 speed. And we ask well, could the human research community have solved these algorithm problems, do
1:06:44 things like invent transformers over 100 years, if we have AIs with a population effective population
1:06:53 similar to the humans but running 100 times as fast and so. You have to tell a story where no,
1:07:00 the AI can't really do the same things as the humans and we're talking about what happens when
1:07:08 the AIs are more capable of in fact doing that. Although they become more capable as lesser
1:07:14 capable versions of themselves help us make themselves more capable, right? You have to kickstart that at some point. Is there an example in analogous situations? Is intelligence unique in
1:07:27 the sense that you have a feedback loop of — with a learning curve or something else, a system’s
1:07:33 outputs are feeding into its own inputs. Because if we're talking about something like Moore's law
1:07:39 or the cost of solar, you do have this way where we're throwing more people with the problem and
1:07:47 we're making a lot of progress, but we don't have this additional part of the model where
1:07:53 Moore's law leads to more humans somehow and more humans are becoming researchers. 1:07:58 You do actually have a version of that in the case of solar. You have a small infant industry
1:08:05 that's doing things like providing solar panels for space satellites and then getting increasing amounts of subsidized government demand because of worries about fossil fuel depletion and then
1:08:17 climate change. You can have the dynamic where visible successes with solar and lowering prices
1:08:24 then open up new markets. There's a particularly huge transition where renewables become cheap
1:08:31 enough to replace large chunks of the electric grid. Earlier you were dealing with very niche
1:08:36 situations like satellites, it’s very difficult to refuel a satellite in place and in remote
1:08:43 areas. And then moving to the sunniest areas in the world with the biggest solar subsidies.
1:08:53 There was an element of that where more and more investment has been thrown into the field and the
1:08:59 market has rapidly expanded as the technology improved. But I think the closest analogy
1:09:04 is actually the long run growth of human civilization itself and I know you had Holden Karnofsky from the open philanthropy project on earlier and discuss some of this research about
1:09:16 the long run acceleration of human population and economic growth. Developing new technologies
1:09:24 allowed the human population to expand and humans to occupy new habitats and new areas
1:09:31 and then to invent agriculture to support the larger populations and then even more advanced agriculture in the modern industrial society. So there, the total technology and output allowed you
1:09:42 to support more humans who then would discover more technology and continue the process. Now
1:09:49 that was boosted because on top of expanding the population the share of human activity that was
1:09:56 going into invention and innovation went up and that was a key part of the industrial revolution. There was no such thing as a corporate research lab or an engineering university
1:10:08 prior to that. So you're both increasing the total human population and the share of it going in. But this population dynamic is pretty analogous. Humans invent farming,
1:10:17 they can have more humans, they can invent industry and so on. Maybe somebody would be skeptical that with AI progress specifically, it’s not just a matter of
1:10:29 some farmer figuring out crop rotation or some blacksmith figuring out how to do metallurgy
1:10:34 better. In fact even to make the 50% improvement in productivity you basically need something on
1:10:41 the IQ that's close to Ilya Sutskever. There's like a discontinuous line. You’re contributing
1:10:48 very little to productivity and then you're like Ilya and then you contribute a lot. You see what
1:10:54 I'm saying? There isn't a gradual increase in capabilities that leads to the feedback. You're imagining a case where the distribution of tasks is such that there's nothing that
1:11:05 individually automating it particularly helps and so the ability to contribute to AI research is
1:11:12 really end loaded. Is that what you're saying? Yeah, we already see this in these really high
1:11:18 IQ companies or projects. Theoretically I guess Jane Street or OpenAI could hire like a bunch of
1:11:26 mediocre people with a comparative advantage to do some menial task and that could free up the time
1:11:32 of the really smart people but they don't do that right? Due to transaction costs or whatever else. 1:11:37 Self-driven cars would be another example where you have a very high quality threshold. Your
1:11:43 performance as a driver is worse than a human, like you have 10 times the accident rate or
1:11:49 100 times the accident rate, then the cost of insurance for that which is a proxy for people's willingness to ride the car would be such that the insurance costs would absolutely
1:11:58 dominate. So even if you have zero labor cost, it is offset by the increased insurance costs. There are lots of cases like that where partial automation is in practice not very usable because
1:12:12 complementing other resources you're gonna use those other resources less efficiently.
1:12:19 In a post-AGI future the same thing can apply to humans. People can say,
1:12:26 comparative advantage, even if AIs can do everything better than a human well it's
1:12:32 still worth something. Human can do something. They can lift a box, that's something. [unclear]
1:12:48 In such an economy you wouldn't want to let a human worker into any industrial environment
1:12:53 because in a clean room they'll be emitting all kinds of skin cells and messing things up. You
1:12:59 need to have an atmosphere there. You need a bunch of supporting tools and resources and materials
1:13:04 and those supporting resources and materials will do a lot more productively working with
1:13:10 AI and robots rather than a human. You don't want to let a human anywhere near the thing just like
1:13:16 you wouldn’t want a Gorilla wandering around in a China shop. Even if you've trained it to,
1:13:21 most of the time pick up a box for you if you give it a banana. It's just not worth it to have it wandering around your china shop. Yeah. Why is that not a good objection? 1:13:30 I think that that is one of the ways in which partial automation can fail to really translate
1:13:38 into a lot of economic value. That's something that will attenuate as we go on and as the AI is
1:13:44 more able to work independently and more able to handle its own screw-ups and get more reliable. 1:13:53 But the way in which it becomes more reliable is by AI progress speeding up which happens if AI can
1:14:00 contribute to it but if there is some reliability bottleneck that prevents it from contributing to
1:14:06 that progress then you don't have the loop, right? I mean this is why we're not there yet. 1:14:11 But then what is the reason to think we'll be there? The broad reason is the inputs are scaling up. Epoch have a paper called compute trends across
1:14:24 three eras of machine learning and they look at the compute expended on machine
1:14:32 learning systems since the founding of the field of AI, the beginning of the 1950s.
1:14:39 Mostly it grows with Moore's law and so people are spending a similar amount on their experiments
1:14:47 but they can just buy more with that because the compute is coming. That data covers over
1:14:54 20 orders of magnitude, maybe like 24, and of all of those increases since 1952
1:15:03 a little more than half of them happened between 1952 and 2010 and all the rest
1:15:11 since 2010. We've been scaling that up four times as fast as was the case for most of the history of
1:15:19 AI. We're running through the orders of magnitude of possible resource inputs you could need for AI
1:15:26 much much more quickly than we were for most of the history of AI. That's why this is a period
1:15:32 with a very elevated chance of AI per year because we're moving through so much of the
1:15:39 space of inputs per year and indeed it looks like this scale-up taken to its conclusion will cover
1:15:47 another bunch of orders of magnitude and that's actually a large fraction of those that are left
1:15:53 before you start running into saying well, this is going to have to be like evolution with the
1:16:00 simple hacks we get to apply. We're selecting for intelligence the whole time, we're not going to
1:16:05 do the same mutation that causes fatal childhood cancer a billion times even though I mean we keep
1:16:12 getting the same fatal mutations even though they've been done many times. We use gradient descent which takes into account the derivative of improvement on the loss all throughout the network
1:16:22 and we don't throw away all the contents of the network with each generation where
1:16:28 you compress down to a little DNA. So there's that bar of, if you're going to do brute force
1:16:34 like evolution combined with these very simple ways we can save orders of magnitude on that.
1:16:40 We're going to cover a fraction that's like half of that distance in this scale-up over
1:16:47 the next 10 years or so. And so if you started off with a kind of vague uniform prior, you probably
1:16:54 can't make AGI with the amount of compute that would be involved in a fruit fly existing for
1:17:01 a minute which would be the early days of AI. Maybe you would get lucky, we were able to make
1:17:07 calculators because calculators benefited from very reliable serially fast computers
1:17:13 and where we could take a tiny tiny tiny tiny fraction of a human brain's compute
1:17:18 and use it for a calculator. We couldn't take an ant's brain and rewire it to calculate. It's
1:17:24 hard to manage ant farms let alone get them to do arithmetic for you. So there were some things
1:17:30 where we could exploit the differences between biological brains and computers to do stuff super
1:17:38 efficiently on computers. We would doubt that we would be able to do so much better than biology
1:17:44 that with a tiny fraction of an insect's brain we'd be able to get AI early on. On the far end,
1:17:52 it seemed very implausible that we couldn't do better than completely brute force evolution. And so in between you have some number of orders of magnitude of inputs where
1:18:01 it might be. In the 2000s, I would say well, I'm gonna have a pretty uniformish prior I'm
1:18:07 gonna put weight on it happening at the equivalent of 10^25 ops, 10^30, 10^35
1:18:17 and spreading out over that and then I can update another information. And in the short term,
1:18:22 in 2005 I would say, I don't see anything that looks like the cusp of AGI so I'm also gonna
1:18:29 lower my credence for the next five years or the next 10 years. And so that would be kind of like
1:18:35 a vague prior and then when we take into account how quickly are we running through those orders
1:18:40 of magnitude. If I have a uniform prior I assign half of my weight to the first half of remaining
1:18:47 orders of magnitude and if we're gonna run through those, over the next 10 years and some,
1:18:53 then that calls on me to put half of my credence, conditional on if ever we're gonna make AI which
1:18:58 seems likely considering it's a material object easier than evolution, I've got to put similarly
1:19:03 a lot of my credence on AI happening in this scale up and then that's supported by what we're seeing In terms of the rapid advances and capabilities with AI and LLMs in particular. 1:19:16 Okay, that's actually a really interesting point. Now somebody might say, there's not some sense in
1:19:22 which AIs could universally speed up the progress of OpenAI by 50 percent or 100 percent or 200
1:19:28 percent if they're not able to do everything better than Ilya Sutskever can. There's going
1:19:35 to be something in which we're bottlenecked by the human researchers and bottleneck effects dictate
1:19:41 that the slowest moving part of the organization will be the one that kind of determines the speed of the progress of the whole organization or the whole project. Which means that unless you get
1:19:49 to the point where you're doing everything and everybody in the organization can do, you're not going to significantly speed up the progress of the project as a whole. 1:19:57 Yeah, so that is a hypothesis and I think there's a lot of truth to it.
1:20:03 When we think about the ways in which AI can contribute, there are things we talked about before like the AI setting up their own curriculum and that's something that Ilya can't and doesn’t
1:20:14 do directly. And there's a question of how much does that improve performance?
1:20:19 There are these things where the AI helps to produce some code for tasks and it's beyond
1:20:28 hello world at this point. The thing that I hear from AI researchers at leading labs is
1:20:35 that on their core job where they're like most expert it's not helping them that much but then
1:20:42 their job often does involve coding something that's out of their usual area of expertise
1:20:49 or they want to research a question and it helps them there. That saves some of their time and frees them to do more of the bottlenecked work. And I think the idea of,
1:21:02 is everything being dependent on Ilya? And is Ilya so much better than the hundreds of other
1:21:09 employees? A lot of people who are contributing, they're doing a lot of tasks and you can have
1:21:16 quite a lot of gain from automating some areas where you then do just an absolutely enormous
1:21:23 amount of it relative to what you would have done before. Because things like designing the custom curriculum maybe some humans put some work into that but you're not going to employ billions of
1:21:34 humans to produce it at scale and so it winds up being a larger share of the progress than it was
1:21:41 before. You get some benefit from these sorts of things where there's like pieces of my job that
1:21:50 now I can hand off to the AI and lets me focus more on the things that the AI still can't do.
1:21:58 Later on you get to the point where yeah, the AI can do your job including the most difficult parts
1:22:05 and maybe it has to do that in a different way. Maybe it spends a ton more time thinking about
1:22:11 each step of a problem than you and that's the late end. The stronger these bottlenecks' effects
1:22:17 are, the more the economic returns, the scientific returns and such are end-loaded towards getting
1:22:25 full AGI. The weaker the bottlenecks are the more interim results will be really paying off. 1:22:32 I probably disagree with you on how much the Ilya’s of organizations seem to matter. Just
1:22:38 from the evidence alone, how many of the big breakthroughs in deep learning was that single
1:22:45 individual responsible for, right? And how much of his time is he spending doing anything that
1:22:51 Copilot is helping him on? I'm guessing most of it is just managing people and coming up with ideas and trying to understand systems and so on. And if the five or ten people who are like that at
1:23:04 OpenAI or Anthropic or whatever, are basically the way in which algorithmic progress is happening.
1:23:17 I know Copilot is not the thing you're talking about with like just 20% automation, but something like that. How much is that contributing to the core function of the research scientist? 1:23:29 Yeah, [unclear] quantitatively how much we disagree about the
1:23:34 importance of key research employees and such. I certainly think that some researchers add more
1:23:46 than 10 times the average employee, even much more. And obviously managers can add an enormous
1:23:53 amount of value by proportionately multiplying the output of the many people that they manage.
1:23:59 And so that's the kind of thing that we were discussing earlier when talking about. Well
1:24:05 if you had a full human level AI, or AI that had all of the human capabilities plus AI advantages,
1:24:16 you'd benchmark not off of what the typical human performance is but peak human performance
1:24:21 and beyond. So yeah, I accept all that. I do think it makes a big difference for people
1:24:30 how much they can outsource a lot of the tasks that are less wow, less creative and an enormous amount is learned by experimentation. ML has been quite an experimental
1:24:43 field and there's a lot of engineering work in building large super clusters, making hardware
1:24:51 aware optimization and encoding of these things, being able to do the parallelism in large models,
1:24:58 and the engineers are busy and it's not just only a big thoughts kind of area. The other branch
1:25:10 is where will the AI advantages and disadvantages be? One AI advantage is being omnidisciplinary
1:25:22 and familiar with the newest things. I mentioned before there's no human who has a million years
1:25:29 of tensor flow experience. To the extent that we're interested in the very cutting edge of
1:25:36 things that have been developed quite recently then AI that can learn about them in parallel
1:25:41 and experiment and practice with them in parallel can potentially learn much faster than a human.
1:25:47 And the area of computer science is one that is especially suitable for AI to learn in
1:25:55 a digital environment so it doesn't require driving a car around that might kill someone,
1:26:01 have enormous costs. You can do unit tests, you can prove theorems, you can do all sorts
1:26:10 of operations entirely in the confines of a computer, which is one reason why programming
1:26:16 has been benefiting more than a lot of other areas from LLMs recently whereas robotics is lagging.
1:26:24 And considering they are getting better at things like the GRE, math, at programming contests,
1:26:35 and some people have forecasts and predictions outstanding about doing well on the informatics
1:26:44 olympiad and the Math Olympiad and in the last few years when people tried to forecast the
1:26:51 MMLU benchmark which has a lot of sophisticated, graduate student level science kind of questions,
1:27:01 AI knocked that down a lot faster than AI researchers and students who had registered
1:27:09 forecasts on it. If you're getting top-notch scores on graduate exams, creative problem
1:27:18 solving, it's not obvious that that area will be a relative weakness of AI. In fact computer science
1:27:29 is in many ways especially suitable because of getting up to speed with new areas, being able to
1:27:36 get rapid feedback from the interpreter at scale. But do you get rapid feedback if you're doing
1:27:43 something that's more analogous to research? Let's say you have a new model and it’s like,
1:27:49 if we put in 10 million dollars on a mini-training run on this this would be much better. 1:27:55 Yeah for very large models those experiments are going to be quite expensive. You're going to look more at can you build up this capability by generalization? From things
1:28:05 like mini math problems, programming problems, working with small networks. Yeah, fair enough. Scott Aaronson was one of my professors in college
1:28:15 and I took his quantum information class and he recently wrote a blog post where he said,
1:28:25 I had GPT-4 take my quantum information test and it got a B. I was like, “Damn, I got a C on the
1:28:31 final.” I updated in the direction that getting a B on a test probably means it understands
1:28:37 quantum information pretty well. With different areas of strengths and weaknesses than the human students. Sure, sure. Would it be possible for this
1:28:46 intelligence explosion to happen without any hardware progress? If hardware progress stopped
1:28:51 would this feedback loop still be able to produce some explosion with only software? 1:28:57 If we say that the technology is frozen, which I think is not the case right now, Nvidia has
1:29:06 managed to deliver significantly better chips for AI workloads for the last few generations. H100,
1:29:13 A100, V100. If that stops entirely, maybe we'll define this as no more nodes, Moore’s law is over,
1:29:23 at that point the gains you get an amount of compute available come from actually constructing
1:29:29 more chips and there are economies of scale you could still realize there. Right now a chip maker
1:29:37 has to amortize the R&D cost of developing the chip and then the capital equipment is created.
1:29:45 You build a fab, its peak profits are going to come in the few years when the chips it's making
1:29:50 are at the cutting edge. Later on as the cost of compute exponentially falls, you keep the fab open
1:29:58 because you can still make some money given that it's built. But of all the profits the fab will
1:30:04 ever make, they're relatively front loaded because that’s when its technology is near the cutting
1:30:10 edge. So in a world where Moore’s law ends then you wind up with these very long production runs
1:30:18 where you can keep making chips that stay at the cutting edge and where the R&D costs
1:30:25 get amortized over a much larger base. So the R&D basically drops out of the price and then you get
1:30:32 some economies of scale from just making so many fabs. And this is applicable in general across
1:30:41 industries. When you produce a lot more, the costs fall. ASML has many incredibly exotic suppliers
1:30:52 that make some bizarre part of the thousands of parts in one of these ASML machines. You can't
1:30:58 get it anywhere else, they don't have standardized equipment for their thing because this is the only
1:31:05 use for it and in a world where we're making 10, 100 times as many chips at the current node then
1:31:12 they would benefit from scale economies. And all of that would become more mass production,
1:31:18 industrialized. You combine all of those things and it seems like the capital costs
1:31:24 of buying a chip would decline but the energy costs of running the chip would not. Right now
1:31:30 energy costs are a minority of the cost, but they're not trivial. It passed 1% a while ago
1:31:41 and they're inching up towards 10% and beyond. And so you can maybe get another order of
1:31:48 magnitude cost decrease from getting really efficient at the capital construction, but
1:31:56 energy would still be a limiting factor after the end of actually improving the chips themselves. 1:32:02 Got it. And when you say there would be a greater population of AI researchers, are we using population as a thinking tool of how they could be more effective? Or do you
1:32:12 literally mean that the way you expect these AIs to contribute a lot to research is just by having
1:32:18 a million copies of a researcher thinking about the same problem or is it just a useful thinking
1:32:24 model for what it would look like to have a million times smarter AI working on that problem? That's definitely a lower bound model and often I'm meaning something more like,
1:32:34 effective population or that you'd need this many people to have this effect. We were talking
1:32:40 earlier about the trade-off between training and inference in board games and you can get
1:32:47 the same performance by having a bigger model or by calling the model more times. In general
1:32:54 it's more effective to have a bigger smarter model and call it less times up until the point
1:32:59 where the costs equalize between them. We would be taking some of the gains of our larger compute on
1:33:06 having bigger models that are individually more capable. And there would be a division of labor.
1:33:12 The tasks that were most cognitively demanding would be done by these giant models, but some very easy tasks, you don't want to expend that giant model if a model 1/100th the size can take that
1:33:24 task. Larger models would be in the positions of researchers and managers and they would
1:33:31 have swarms of AIs of different sizes as tools that they could make API calls to and whatnot. After human-level AGI 1:33:37 Okay, we accept the model and now we've gone to something that is at least as smart as Ilya Sutskever on all the tasks relevant to progress and you can have
1:33:47 so many copies of it. What happens in the world now? What do the next months or years or whatever timeline is relevant now look like? To be clear what's happened is not that we have
1:33:59 something that has all of the abilities and advantages of humans plus the AI advantages, what we have is something doing things like making a ton of calls to make up for being individually
1:34:12 less capable or something that’s able to drive forward AI progress. That process is continuing,
1:34:18 so AI progress has accelerated greatly in the course of getting there. Maybe we go
1:34:24 from our eight months doubling time of software progress in effective compute to four months,
1:34:32 or two months. There's a report by Tom Davidson at the open philanthropy project,
1:34:40 which spun out of work I had done previously and I advised and helped with that project but
1:34:52 Tom really carried it forward and produced a very nice report and model which Epoch is hosting. You
1:34:59 can plug in your own version of the parameters and there is a lot of work estimating the parameter,
1:35:06 things like — What's the rate of software progress? What's the return to additional
1:35:11 work? How does performance scale at these tests as you boost the models? And in general,
1:35:22 broadly human level in every domain with all the advantages is pretty deep into that. So if
1:35:34 we already have an eight months doubling time for software progress
1:35:39 then by the time you get to that kind of a point, it's maybe more like four months, two months,
1:35:47 going into one month. If the thing is just proceeding at full speed
1:35:53 then each doubling can come more rapidly and we can talk about what are the spillovers? 1:36:04 As the models get more capable they can be doing other stuff in the world, they can spend some of their time making google search more efficient. They can be hired as chat bots with
1:36:16 some inference compute and then we can talk about if that intelligence explosion process is allowed
1:36:25 to proceed then what happens is, you improve your software by a factor of two. The efforts
1:36:36 needed to get the next doubling are larger, but they're not twice as large, maybe they're like 25 percent to 35 percent larger. Each one comes faster and faster until you hit limitations like
1:36:49 you can no longer make further software advances with the hardware that you have and looking at
1:36:58 reasonable parameters in that model, if you have these giant training runs you can go very far.
1:37:06 The way I would see this playing out is as the AIs get better and better at research, they can work
1:37:13 on different problems, they can work on improving software, they can work on improving hardware, they can do things like create new industrial technologies, new energy technology,
1:37:22 they can manage robots, they can manage human workers as executives and coaches and whatnot.
1:37:30 You can do all of these things and AIs wind up being applied where the returns are highest.
1:37:37 Initially the returns are especially high in doing more software and the reason for that is again,
1:37:46 if you improve the software you can update all of the GPUs that you have access to.
1:37:53 Your cloud compute is suddenly more potent. If you design a new chip design, it'll take
1:38:02 a few months to produce the first ones and it doesn't update all of your old chips. So you have
1:38:10 an ordering where you start off with the things where there's the lowest dependence on existing stocks and you can more just take whatever you're developing and apply
1:38:21 it immediately. So software runs ahead, you're getting more towards the limits of that software
1:38:28 and I think that means things like having all the human advantages but combined with AI advantages.
1:38:39 Given the kind of compute that would be involved if we're talking about this hundreds of billions
1:38:45 of dollars training run, there's enough compute to run tens of millions, hundreds of millions of
1:38:54 human scale minds. They're probably smaller than human scale. To be similarly efficient
1:39:01 at the limits of algorithmic progress because they have the advantage of a million years of education. They have the other advantages we talked about. You've got that wild capability
1:39:10 and further software gains are running out. They start to slow down again because you're
1:39:18 getting towards the limits. You can't do any better than the best. What happens then? 1:39:26 By the time they're running out have we already hit super intelligence or? Yeah, you're wildly super intelligent. Just by having the abilities that humans have and
1:39:38 then combining it with being very well focused and trained in the task beyond what any human could be and then running faster. I'm not going to assume that there's huge qualitative improvements
1:39:51 you can have. I'm not going to assume that humans are very far from the efficient frontier
1:39:56 of software except with respect to things like, yeah we have a limited lifespan so we
1:40:01 couldn't train super intensively. We couldn't incorporate other software into our brains. We
1:40:07 couldn't copy ourselves. We couldn't run at fast speeds. So you've got all of those capabilities
1:40:15 and now I'm skipping ahead of the most important months in human history. I can talk about
1:40:26 what it looks like if it's just the AIs took over, they're running things as they like.
1:40:34 How do things expand? I can talk about things as, how does this go? In a world where we've roughly,
1:40:44 or at least so far, managed to retain control of where these systems are going. By jumping ahead,
1:40:53 I can talk about how this would translate into the physical world? This is something that I think is a stopping point for a lot of people in thinking about what would an intelligence
1:41:02 explosion look like? They have trouble going from, well there's stuff on servers and cloud compute
1:41:09 and that gets very smart. But then how does what I see in the world change? How does industry or
1:41:16 military power change? If there's an AI takeover what does that look like? Are there killer robots?
1:41:24 One course we might go down is to discuss how we managed that wildly accelerating transition.
1:41:35 How do you avoid it being catastrophic? And another route we could go is how
1:41:41 does the translation from wildly expanded scientific R&D capabilities intelligence
1:41:49 on these servers translate into things in the physical world? You're moving along in
1:41:56 order of what has the quickest impact largely or where you can have an immediate change. 1:42:08 One of the most immediately accessible things is where we
1:42:14 have large numbers of devices or artifacts or capabilities that are already AI operable
1:42:23 with hundreds of millions equivalent researchers. You can quickly solve self-driving cars,
1:42:33 make the algorithms much more efficient, do great testing and simulation, and then operate a large
1:42:40 number of cars in parallel if you need to get some additional data to improve the simulation
1:42:47 and reasoning. Although, in fact humans with quite little data are able to achieve human-level
1:42:54 driving performance. After you've really maxed out the easily accessible algorithmic improvements in
1:43:02 this software-based intelligence explosion that's mostly happening on server farms then you have
1:43:07 minds that have been able to really perform on a lot of digital-only tasks, they're doing great
1:43:13 on video games, they're doing great at predicting what happens next in a youtube video. If you have
1:43:20 a camera that they can move they're able to predict what will happen at different angles.
1:43:26 Humans do this a lot where we naturally move our eyes in such a way to get images from different
1:43:33 angles and different presentations and then predicting combined from that. And you can operate
1:43:39 many cars, many robots at once, to get very good robot controllers. So you should think that
1:43:46 all the existing robotic equipment or remotely controllable equipment that is wired for that,
1:43:53 the AIs can operate that quite well. I think some people might be skeptical
1:43:58 that existing robots given their current hardware will have the dexterity and the maneuverability to
1:44:05 do a lot of physical labor that an AI might want to do. Do you have reason for thinking otherwise? There's also not very many of them. Production of industrial robots is hundreds of thousands
1:44:16 per year and they can do quite a bit in place. Elon Musk is promising a humanoid robot in the
1:44:24 tens of thousands of dollars that may take a lot longer than he has said, as this happened with
1:44:34 other technologies, but that's a direction to go. But most immediately, hands are actually probably
1:44:41 the most scarce thing. But if we consider what do human bodies provide? There's the brain and in
1:44:49 this situation, we have now an abundance of high quality brain power that will be increasing as the
1:44:56 AIs will have designed new chips, which will be rolling out from the TSMC factories, and they'll
1:45:02 have ideas and designs for the production of new fab technologies, new nodes, and additional fabs.
1:45:11 But looking around the body. There's legs to move around, and not only that necessarily, wheels work pretty well. Many factory jobs and office jobs can be fully virtualized. But yeah,
1:45:29 some amount of legs, wheels, other transport. You have hands and hands are something that are
1:45:36 on the expensive end in robots. We can make them, they're made in very small production runs partly
1:45:43 because we don't have the control software to use them. In this world the control software is fabulous and so people will produce much larger production runs of them over time, possibly using
1:45:56 technology, possibly with quite different technology. But just taking what we've got,
1:46:03 right now the industrial robot industry produces hundreds of thousands of machines a year.
1:46:11 Some of the nicer ones are like 50,000 dollars. In aggregate the industry has tens of billions of
1:46:17 dollars of revenue. By comparison the automobile industry produces over 60 million cars a year,
1:46:25 it has revenue of over two trillion dollars per annum. Converting that production capacity over
1:46:37 towards robot production would be one of the things to do and in World War
1:46:47 Two, industrial conversion of American industry took place over several years and really amazingly
1:46:57 ramped up military production by converting existing civilian industry. And that was
1:47:03 without the aid of superhuman intelligence and management at every step in the process
1:47:09 so yeah, part of that would be very well designed. You'd have AI workers who understood every part of
1:47:19 the process and could direct human workers. Even in a fancy factory, most of the time
1:47:28 it's not the hands doing a physical motion that a worker is being paid for. They're
1:47:34 often looking at things or deciding what to change, the actual time spent in manual motion
1:47:43 Is a limited portion of that. So in this world of abundant AI cognitive abilities
1:47:49 where the human workers are more valuable for their hands than their heads,
1:47:55 you could have a worker previously without training and expertise in the area who has a
1:48:04 smartphone on a headset, and we have billions of smartphones which have eyes and ears and methods
1:48:12 for communication for an AI to be talking to a human and directing them in their physical
1:48:17 motions with skill as a a guide and coach that is beyond any human. They could be a lot better at
1:48:25 telepresence and remote work and they can provide VR and augmented reality guidance to help people
1:48:32 get better at doing the physical motions that they're providing in the construction. Say you convert the auto industry to robot production. If it can produce an amount of mass
1:48:47 of machines that is similar to what it currently produces, that's enough for a
1:48:55 billion human size robots a year. The value per kilogram of cars is somewhat less than high-end
1:49:07 robots but yeah, you're also cutting out most of the wage bill because most of the wage bill
1:49:14 is payments ultimately to human capital and education and not to the physical hand motions
1:49:21 and lifting objects and that sort of tasks. So at the existing scale of the auto industry you
1:49:27 can make a billion robots a year. The auto industry is two or three percent of the existing economy, you're replacing these cognitive things. If right now physical hand
1:49:39 motions are like 10% of the work, redirect humans into those tasks. In the world at large right now,
1:49:52 mean income is on the order of $10,000 a year but in rich countries, skilled workers earn more
1:49:58 than a hundred thousand per year. Some of that is just not management roles of which only a certain
1:50:06 proportion of the population can have but just being an absolutely exceptional peak and human
1:50:13 performance of some of these construction and such roles. Just raising productivity
1:50:22 to match the most productive workers in the world is room to make a very big gap.
1:50:32 With AI replacing skills that are scarce in many places where there's abundant currently
1:50:39 low wage labor, you bring in the AI coach and someone who was previously making very
1:50:45 low wages can suddenly be super productive by just being the hands for an AI. on a naive
1:50:54 view if you ignore the delay of capital adjustment of building new tools for the workers.
1:51:02 Just raise the typical productivity for workers around the world to be more like rich countries
1:51:11 and get 5x/10x like that. Get more productivity with AI handling the difficult cognitive tasks,
1:51:20 reallocating people from office jobs to providing physical motions. And since right now that's a
1:51:27 small proportion of the economy you can expand the hands for manual labor by an
1:51:34 order of magnitude within a rich country. Because most people are sitting in an office or even on a
1:51:42 factory floor or not continuously moving. You've got billions of hands lying around in humans
1:51:48 to be used in the course of constructing your waves of robots and now once you have
1:51:55 a quantity of robots that is approaching the human population and they work 24 x 7 of course,
1:52:03 the human labor will no longer be valuable as hands and legs but at the very beginning of the
1:52:09 transition, just like new software can be used to update all of the GPUs to run the latest AI,
1:52:17 humans are legacy population with with an enormous number of underutilized hands and feet that the AI
1:52:25 can use for the initial robot construction. Cognitive tasks are being automated and the
1:52:31 production of them is greatly expanding and then the physical tasks which complement them
1:52:37 are utilizing humans to do the parts that robots that exist can't do. Is the implication of this
1:52:42 that you're getting to that world production would increase just a tremendous amount or that AI could
1:52:48 get a lot done of whatever motivations it has? There's an enormous increase in production
1:52:55 for humans just by switching over to the role of providing hands and feet
1:53:02 for AI where they're limited, and this robot industry is a natural place to apply it. And
1:53:09 so if you go to something that's like 10x the size of the current car industry in terms of its
1:53:17 production, which would still be like a third of our current economy and the aggregate productive
1:53:22 capabilities of the society with AI support are going to be a lot larger. They make 10 billion
1:53:28 humanoid robots a year and then if you do that, the legacy population of a few billion
1:53:37 human workers is no longer very important for the physical tasks and then the new automated
1:53:45 industrial base can just produce more factories, produce more robots. The interesting thing is
1:53:51 what's the doubling time? How long does it take for a set of computers, robots, factories and
1:53:59 supporting equipment to produce another equivalent quantity of that? For GPUs, brains, this is really
1:54:08 easy, really solid. There's an enormous margin there. We were talking before about
1:54:15 skilled human workers getting paid a hundred dollars an hour is
1:54:23 quite normal in developed countries for very in-demand skills. And you make a GPU,
1:54:31 they can do that work. Right now, these GPUs are tens of thousands of dollars.
1:54:39 If you can do a hundred dollars of wages each hour then in a few weeks, you pay back
1:54:48 your costs. If the thing is more productive and you can be a lot more productive than a typical
1:54:57 high-paid human professional by being the very best human professional and even better than that by having a million years of education and working all the time.
1:55:05 Then you could get even shorter payback times. You can generate the dollar value of the initial
1:55:14 cost of that equipment within a few weeks. A human factory worker can earn 50,000 dollars
1:55:25 a year. Really top-notch factory workers earning more and working all the time, if they can produce
1:55:33 a few hundred thousand dollars of value per year and buy a robot that costs 50,000 to replace them
1:55:42 that's a payback time of some months, That is about the financial return. 1:55:47 Yeah, and we're gonna get to the physical capital return because those are gonna
1:55:53 diverge in this scenario. What we really care about are the actual physical operations
1:56:07 that a thing does. How much do they contribute to these tasks? And I'm using this as a start to try
1:56:16 and get back to the physical replication times. I guess I'm wondering what is the implication
1:56:22 of this. Because you started off this by saying people have not thought about what the physical implications of super intelligence would be. What is the bigger takeaway, whatever you're
1:56:32 wrong about, when we think about what the world will look like with super intelligence? With robots that are optimally operated by AI, extremely finely operated and building
1:56:44 technological designs and equipment and facilities under AI direction.
1:56:50 How much can they produce? For a doubling you need the AIs to produce stuff that is, in aggregate,
1:57:01 at least equal to their own cost. So now we're pulling out these things like labor costs
1:57:10 that no longer apply and then trying to zoom in on what these capital costs will be. You're still
1:57:16 going to need the raw materials. You're still going to need the robot time building the next robot. I think it's pretty likely that with the advanced AI work they can design some incremental
1:57:27 improvements, and with the industry scale up, you can get 10 fold and better cost reductions
1:57:37 by making things more efficient and replacing the human human cognitive labor. Maybe you need
1:57:44 $5,000 of costs under our current environment. But the big change in this world is, we're trying
1:57:55 to produce this stuff faster. If we're asking about the doubling time of the whole system
1:58:01 in say one year, if you have to build a whole new factory to double everything, you don't have time
1:58:08 to amortize the cost of that factory. Right now you might build a factory and use it for 10 years
1:58:14 and buy some equipment and use it for five years. That's your capital cost and in an accounting
1:58:21 context, you depreciate each year a fraction of that capital purchase. But if we're trying to
1:58:29 double our entire industrial system in one year, then those capital costs have to be multiplied.
1:58:37 So if we're going to be getting most of the return on our factory in the first year, instead of 10
1:58:43 years weighted appropriately, then we're going to say okay our capital cost has to go up by 10 fold.
1:58:51 Because I'm building an entire factory for this year's production. It will do more stuff later
1:58:56 but it's most important early on instead of over 10 years and so that's going to raise the cost of
1:59:06 that reproduction. It seems like going from the current decade long cycle of amortizing
1:59:15 factories and fabs and shorter for some things, the longest are things like big buildings. Yeah,
1:59:23 that could be a 10 fold increase from moving to a double the physical stuff each year in capital
1:59:30 costs. Given the savings that we get in the story from scaling up the industry, from removing the
1:59:37 [unclear] to human cognitive labor and then just adding new technological advancements and super
1:59:44 high quality cognitive supervision, applying more of it than was applied today. It looks like you
1:59:50 can get cost reductions that offset that increased capital capital cost. Your $50,000 improved robot
2:00:01 arms or industrial robots can do the work of a human factory worker. It would be the equivalent
2:00:09 of hundreds of thousands of dollars. By default they would cost more than the $50,000 today,
2:00:19 but then you apply all these other cost savings and it looks like you then get a period of robot
2:00:25 doubling time that is less than a year. I think significantly less than a year as you get into it. 2:00:32 So in this first first phase you have humans under AI direction
2:00:37 and existing robot industry and converted auto industry and expanded facilities making robots.
2:00:49 In less than a year you've produced robots until their combined production is exceeding
2:00:54 that of humans’ arms and feet and then you could have a doubling time period of months.
2:01:08 [unclear] That's not to say that's the limit of the most that technology could do
2:01:18 because biology is able to reproduce at faster rates and maybe we're talking about that in a
2:01:24 moment, but if we're trying to restrict ourselves to robotic technology as we understand it and cost
2:01:30 falls that are reasonable from eliminating all labor, massive industrial scale up, and historical kinds of technological improvements that lowered costs, I think you you can get into a
2:01:42 robot population industry doubling in months. Got it. And then what is the implication of the
2:01:50 biological doubling times? This doesn't have to be biological, but you can do Drexler-like first
2:01:59 principles, how much would it cost to build both a nanotech thing that could build more nanobots? I certainly take the human brain and other biological brains as very relevant data
2:02:09 points about what's possible with computing and intelligence. With the reproductive capability of
2:02:14 biological plants and animals and microorganisms, I think it is relevant. It's possible for systems
2:02:25 to reproduce at least this fast. At the extreme you have bacteria that are heterotrophic so
2:02:30 they're feeding on some abundant external food source and ideal conditions. And there's some
2:02:36 that can divide every 20 or 60 minutes. Obviously that's absurdly fast. That seems on the low end
2:02:46 because ideal conditions require actually setting them up. There needs to be abundant energy there.
2:02:53 If you're actually having to acquire that energy by building solar panels,
2:02:59 or burning combustible materials, or whatnot, then the physical equipment to produce those ideal
2:03:07 conditions can be a bit slower. Cyanobacteria, which are self-powered from solar energy, the
2:03:15 really fast ones in ideal conditions can double in a day. A reason why cyanobacteria isn't the food
2:03:22 source for everyone and everything is it's hard to ensure those ideal conditions and then to extract
2:03:29 them from the water. They do of course power the aquatic ecology but they're floating in liquid.
2:03:38 Getting resources that they need to them and out is tricky and then extracting your product. One
2:03:46 day doubling times are possible powered by the sun and then if we look at things like insects,
2:03:55 fruit flies can have hundreds of offspring in a few weeks. You extrapolate that over a year and you just fill up anything accessible. Right
2:04:09 now humanity uses less than one thousandths of the heat envelope of the earth. Certainly you
2:04:16 can get done with that in a year if you can reproduce your industrial base at that rate.
2:04:23 And then even interestingly with the flies, they do have brains. They have a significant amount
2:04:31 of computing substrate. So there's something of a point or two. If we could produce computers
2:04:38 in ways as efficient as the construction of brains then we could produce computers very effectively
2:04:43 and then the big question about that is the brains that get constructed biologically they
2:04:51 grow randomly and then are configured in place. It's not obvious you would be able to make them
2:04:57 have an ordered structure like a top-down computer chip that would let us copy data into them.
2:05:04 So something like that where you can't just copy your existing AIs and integrate them is going to be less valuable than a GPU. Well, what are the things you couldn't copy? 2:05:14 A brain grows by cell division and then random connections are formed.
2:05:22 Every brain is different and you can't rely on — yeah, we'll just copy this file into
2:05:28 the brain. For one thing, there's no input-output for that. You need to have that but the structure
2:05:35 is also different. You wouldn't be able to copy things exactly. Whereas when we make a CPU or GPU,
2:05:42 they're designed incredibly finely and precisely and reliably. They break with incredibly tiny
2:05:48 imperfections and they are set up in such a way that we can input large amounts of data.
2:05:53 Copy a file and have the new GPU run an AI just as capable as any other. Whereas with a human child,
2:06:00 they have to learn everything from scratch because we can't just connect them to a fiber optic cable
2:06:05 and they're immediately a productive adult. So that there's no genetic bottleneck? 2:06:11 Yeah, you can share the benefits of these giant training runs and such. So that's a question of
2:06:16 how if you're growing stuff using biotechnology, how you could effectively copy and transfer data.
2:06:24 And now you mentioned Eric Drexler's ideas about creating non-biological nanotechnology,
2:06:33 artificial chemistry that was able to use covalent bonds and reproduce. In some ways, have a more
2:06:42 industrial approach to molecular objects. Now there's controversy about whether that will work,
2:06:48 how effective would it be if it did? And certainly if you can get things that are like
2:06:56 biology in their reproductive ability but can do computing or
2:07:03 be connected to outside information systems, then that's pretty tremendous. You can produce physical
2:07:12 manipulators and compute at ludicrous speeds. And there's no reason to think in principle
2:07:18 they couldn't, right? In fact, in principle we have every reason to think they could. The reproductive abilities, absolutely because Biology does that.
2:07:30 There’s challenges to the practicality of the necessary chemistry. My bet would be that we
2:07:40 can move beyond biology in some important ways. For the purposes of this discussion,
2:07:45 I think it's better not to lean on that because I think we can get to many of the same conclusions
2:07:53 on things that just are more universally accepted. The bigger point being that once you have super
2:08:01 intelligence you very quickly get to a point where a great portion of the 1000x greater energy profile that the sun makes available to the earth is used by the AI. 2:08:11 Or by the civilization empowered by AI. That could be an AI-civilization
2:08:19 or it could be a human-AI civilization. It depends on how well we manage things and
2:08:25 what the underlying state of the world is. Okay, so let's talk about that. When we're talking about how they could take over, is it best to start at a subhuman intelligence
2:08:34 or should we just start at we have a human-level intelligence and the takeover or the lack thereof? 2:08:44 Different people might have somewhat different views on this but for me when I am concerned about
AI takeover scenarios 2:08:54 either outright destruction of humanity or an unwelcome AI takeover of civilization,
2:09:03 most of the scenarios I would be concerned about pass through a process of AI being
2:09:11 applied to improve AI capabilities and expand. This process we were talking about earlier where
2:09:20 AI research is automated. Research labs, companies, a scientific community running
2:09:28 within the server farms of our cloud compute. So OpenAI has basically been turned into a
2:09:33 program. Like a closed circuit. Yeah, and with a large fraction of the world's compute probably going into whatever training runs and AI societies.
2:09:45 There'd be economies of scale because if you put in twice as much compute in this, the AI research
2:09:51 community goes twice as fast, that's a lot more valuable than having two separate training runs.
2:09:58 There would be some tendency to bandwagon. You have some some small startup, even if
2:10:07 they make an algorithmic improvement, running it on 10 times, 100 times or even two times,
2:10:13 if you're talking about say Google and Amazon teaming up. I'm actually not sure what the precise
2:10:19 ratio of their cloud resources is. Since these interesting intelligence explosion impacts come
2:10:25 from the leading edge there's a lot of value in not having separated walled garden ecosystems and
2:10:33 having the results being developed by these AIs be shared. Have larger training runs be shared.
2:10:40 I'm imagining this is something like some very large company, or consortium of companies, likely
2:10:49 with a lot of government interest and supervision, possibly with government funding, producing this
2:10:58 enormous AI society in their cloud which is doing all sorts of existing AI applications
2:11:06 and jobs as well as these internal R&D tasks. At this point somebody might say, this sounds
2:11:13 like a situation that would be good from a takeover perspective because if it's going to
2:11:18 take tens of billions of dollars worth of compute to continue this training for this AI society, it
2:11:24 should not be that hard for us to pull the brakes if needed as compared to something that could run
2:11:30 on a single cpu. Okay so there's an AI society that is a result of these training runs and with
2:11:42 the power to improve itself on these servers. Would we be able to stop it at this point? And what does an attempt at takeover look like? We're skipping over why that might happen.
2:11:55 For that, I'll just briefly refer to and incorporate by reference
2:12:01 some discussion by my Open Philanthropy colleague, Ajeya Cotra, she has a piece called
2:12:15 default outcome of training AI without specific countermeasures. Default outcome is a takeover.
2:12:23 But yes, we are training models that for some reason vigorously pursue a higher reward
2:12:33 or a lower loss and that can be because they wind up with some motivation where they want reward.
2:12:40 And then if they had control of their own training process, they can ensure that it could
2:12:46 be something like they develop a motivation around an extended concept of reproductive fitness,
2:12:54 not necessarily at the individual level, but over the generations of training tendencies that tend to propagate themselves
2:13:06 becoming more common and it could be that they have some goal in the world which is served well
2:13:15 by performing very well on the training distribution. By tendencies do you mean power seeking behavior? Yeah, so an AI that behaves well on the training
2:13:25 distribution because it wants it to be the case that its tendencies wind up being preserved or
2:13:33 selected by the training process will then behave to try and get very high reward or low loss
2:13:43 be propagated. But you can have other motives that go through the same behavior because it's
2:13:48 instrumentally useful. So an AI that is interested in having a robot takeover because it will change
2:13:57 some property of the world then has a reason to behave well on the training distribution.
2:14:05 Not because it values that intrinsically but because if it behaves differently then it will be changed by gradient descent and its goal is less likely to be pursued. It
2:14:15 doesn't necessarily have to be that this AI will survive because it probably won't. AIs are constantly spawned and deleted on the servers and the new generation proceed. But
2:14:24 if an AI that has a very large general goal that is affected by these kind of macro scale
2:14:30 processes could then have reason to behave well over this whole range of training situations. 2:14:36 So this is a way in which we could have AIs train that develop internal motivations such that they
2:14:44 will behave very well in this training situation where we have control over their reward signal and
2:14:50 their physical computers and if they act out they will be changed and deleted. Their goals
2:14:59 will be altered until there's something that does behave well. But they behave differently
2:15:06 when we go out of distribution on that. When we go to a situation where the AIs by their choices
2:15:14 can take control of the reward process, they can make it such that we no longer have power over
2:15:20 them. Holden previously mentioned the King Lear problem where King Lear offers rulership of his
2:15:33 kingdom to the daughters that loudly flatter him and proclaim their devotion and then once he has
2:15:46 irrevocably transferred the power over his kingdom he finds they treat him very badly because the
2:15:54 factor shaping their behavior to be kind to him when he had all the power, it turned out that the
2:16:02 internal motivation that was able to produce the behavior that won the competition actually wasn't
2:16:07 interested in being loyal out of distribution when there was no longer an advantage to it. 2:16:15 If we wind up with this situation where we were producing these millions of AI instances of tremendous capability, they're all doing their jobs very well initially, but if we
2:16:26 wind up in a situation where in fact they're generally motivated to, if they get a chance,
2:16:33 take control from humanity and then would be able to pursue their own purposes. Sure,
2:16:39 they're given the lowest loss possible or have whatever motivation they attach to in the training
2:16:47 process even if that is not what we would have liked. And we may have in fact actively trained
2:16:54 that. If an AI that had a motivation of always be honest and obedient and loyal to a human if there
2:17:01 are any cases where we mislabel things, say people don't want to hear the truth about their religion
2:17:07 or polarized political topic, or they get confused about something like the Monty Hall problem which
2:17:13 is a problem that many people are famously confused about in statistics. In order to get
2:17:19 the best reward the AI has to actually manipulate us, or lie to us, or tell us what we want to hear
2:17:26 and then the internal motivation of — always be honest to the humans. We're going to actively
2:17:32 train that away versus the alternative motivation of — be honest to the humans when they'll catch
2:17:40 you if you lie and object to it and give it a low reward but lie to the humans when
2:17:46 they will give that a high reward. So how do we make sure it's not the thing it learns is not to manipulate us into rewarding it when we catch it not lying but
2:17:57 rather to universally be aligned. Yeah, so this is tricky. Geoff
2:18:04 Hinton was recently saying there is currently no known solution for this. 2:18:10 What do you find most promising? General directions that people are pursuing is one, you can try and make the training data better and better
2:18:20 so that there's fewer situations where the dishonest generalization is favored. And
2:18:27 create as many situations as you can where the dishonest generalization is likely to slip up.
2:18:35 So if you train in more situations where even a quite complicated deception gets caught,
2:18:45 and even in situations that would be actively designed to look like you could get away with it, but really you can’t. These would be adversarial examples and adversarial training. 2:18:56 Do you think that would generalize to when it is in a situation where we couldn't plausibly catch it and it knows we couldn't plausibly catch it. It's not logically necessary. As we apply that
2:19:08 selective pressure you'll wipe away a lot of possibilities. So an AI that has a habit of just
2:19:16 compulsive pathological lying will very quickly get noticed and that motivation system will get
2:19:22 hammered down and you keep doing that, but you'll be left with still some distinct motivations
2:19:28 probably that are compatible. An attitude of always be honest unless you have a super strong
2:19:38 inside view that checks out lots of mathematical consistency checks, really absolutely super-duper
2:19:46 for real, this is a situation where you can get away with some shenanigans that you shouldn't.
2:19:52 That motivation system is very difficult to distinguish from actually be honest because
2:20:00 the conditional and firing most of the time if it's causing mild distortion and situations
2:20:06 of telling you what you want to hear or things that, we might not be able to pull it out, but
2:20:15 maybe we could and humans are trained with simple reward functions. Things like the sex drive, food,
2:20:25 social imitation of other humans, and we wind up with attitudes concerned with the external world 2:20:32 Although isn’t this famously the argument that.. People use condoms, and the richest, most educated
2:20:42 humans have sub-replacement fertility on the whole, or at least at a national cultural level.
2:20:49 Yeah, there's a sense in which evolution often fails in that respect.
2:20:57 And even more importantly at the neural level. Evolution has implanted various
2:21:04 things to be rewarding and reinforcers and we don't always pursue even those.
2:21:10 And people can wind up in different consistent equilibria or different behaviors where they
2:21:19 go in quite different directions. You have some humans who go from that biological programming
2:21:26 to have children, others have no children, some people go to great efforts to survive. 2:21:33 So why are you more optimistic? Or are you more optimistic that kind of training
2:21:41 for AIs will produce drives that we would find favorable? Does it have to do with the
2:21:46 original point where you were talking about intelligence and evolution, where since we are removing many of the disabilities of evolution with regards to intelligence,
2:21:54 we should expect intelligence through evolution to be easier. Is there a similar reason to expect alignment through gradient descent to be easier than alignment through evolution? 2:22:01 Yeah, so in the limit, if we have positive reinforcement for certain kinds of food sensors
2:22:11 triggering the stomach, negative reinforcement for certain kinds of nociception and yada yada,
2:22:17 in the limit the ideal motivation system for that would be wireheading. This would be a mind that
2:22:28 just hacks and alters those predictors and then all of those systems are recording everything
2:22:36 is great. Some humans claim to have it as at least one portion of their aims. The idea that
2:22:46 I'm going to pursue pleasure even if I don't actually get food or these other reinforcers. If I
2:22:54 just wirehead or take a drug to induce that, that can be motivating. Because if it was correlated
2:23:02 with reward in the past, the idea of pleasure that's correlated with these it's a concept
2:23:09 that applies to these various experiences that I’ve had before which coincided with the biological reinforcers. And so thoughts of yeah, I'm going to be motivated by pleasure can
2:23:19 get developed in a human. But also plenty of humans also say no, I wouldn't want to wire head or I wouldn't want Nozick's experience machine, I care about real stuff in the world and
2:23:31 in the past having a motivation of, yeah, I really care about say my child,
2:23:36 I don't care about just about feeling that my child is good or like not having heard about
2:23:42 their suffering or their their injury because that kind of attitude in the past tended tended to
2:23:52 cause behavior that was negatively rewarded or that was predicted to be negatively rewarded. 2:23:58 There's a sense in which yes, our underlying reinforcement learning machinery wants to wirehead
2:24:06 but actually finding that hypothesis is challenging. And so we can wind up with
2:24:13 a hypothesis or a motivation system like no, I don't want to wirehead. I don't want to go
2:24:18 into the experience machine. I want to actually protect my loved ones. Even though we can know,
2:24:27 yeah, if I tried the super wireheading machine, then I would wirehead all the time or if I tried,
2:24:33 super-duper-ultra-heroine, some hypothetical thing that was directly and in a very sophisticated
2:24:40 fashion hack your reward system, then I would change my behavior ever after but right now,
2:24:46 I don't want to do that because the heuristics and predictors that my brain has learned don’t want
2:24:53 to short circuit that process of updating. They want to not expose the dumber predictors in my
2:25:01 brain that would update my behavior in those ways. So in this metaphor is alignment not wireheading?
2:25:10 I don’t know if you include using condoms as wireheading or not? The AI that is always honest even when an opportunity arises where it could lie and then
2:25:23 hack the servers that it’s on and that leads to an AI takeover and then it can have its loss
2:25:28 set to zero. In some sense that’s a failure of generalization. It's like the AI has not
2:25:35 optimized the reward in this new circumstance. Successful human values as successful they are,
2:25:44 themselves involve a misgeneralization. Not just at the level of evolution but
2:25:52 at the level of neural reinforcement. And so that indicates it is possible
2:25:58 to have a system that doesn't automatically go to this optimal behavior in the limit.
2:26:03 And Ajay talks about a training game, an AI that is just playing the training game
2:26:10 to get reward or avoid loss, avoid being changed, that attitude is one that could be developed
2:26:18 but it's not necessary. There can be some substantial range of situations that are short of having infinite experience of everything including experience of wireheading
2:26:29 where that's not the motivation that you pick up and we could have an empirical science if we had
2:26:35 the opportunity to see how different motivations are developed short of the infinite limit.
2:26:43 How it is that you wind up with some humans being enthusiastic about the idea of wireheading and
2:26:50 others not. And you could do experiments with AIs to try and see, well under these training
2:26:57 conditions, after this much training of this type and this much feedback of this type,
2:27:03 you wind up with such and such a motivation. If I add in more of these cases where there are
2:27:11 tricky adversarial questions designed to try and trick the AI into line
2:27:17 and then you can ask how does that affect the generalization in other situations? It's very
2:27:25 difficult to study and it works a lot better if you have interpretability and you can actually
2:27:30 read the AIs mind by understanding its weights and activations. But the motivation and AI will
2:27:39 have at a given point in the training process is not determined by what in the infinite limit the training would go to. And it's possible that if we could understand the insides of these networks,
2:27:52 we could tell — Ah yeah, this motivation has been developed by this training process
2:27:57 and then we can adjust our training process to produce these motivations
2:28:03 that legitimately want to help us and if we succeed reasonably well at that then those AIs will try to maintain that property as an invariant and we can make them such that
2:28:14 they're relatively motivated to tell us if they're having thoughts about, have you had dreams about
2:28:23 an AI takeover of humanity today? And it's a standard practice that they're motivated to do to
2:28:31 be transparent in that kind of way and you could add a lot of features like this that restrict the
2:28:38 kind of takeover scenario. This is not to say that this is all easy. It requires developing
2:28:45 and practicing methods we don't have yet, but that's the kind of general direction you could go. 2:28:51 You of course know Eliezer’s arguments that something like this is implausible with modern gradient descent techniques because with interpretability we can barely see what's
2:29:02 happening with a couple of neurons and the internal state there, let alone when you have
2:29:09 an embedding dimension of tens of thousands or bigger. How would you be able to catch
2:29:16 what exactly is the incentive? Whether it's a model that is generalized to don't lie
2:29:21 to human's well or whether it isn't. Do you have some sense of why you disagree
2:29:26 with somebody like Eliezer on how plausible this is? Why it's not impossible basically. 2:29:32 There are actually a couple of places. It's something difficult because
2:29:38 his argument is not fully explicit, but he's been doing more lately. I think that is helpful
2:29:46 in that direction. With respect to interpretability I'm relatively optimistic that
2:29:54 the equivalent of an AI lie detector is something that's possible. Initially the internals of an AI
2:30:06 are not optimized to be impenetrable, at least by gradient descent absent gradient hacking.
2:30:15 They're not designed to be resistant to an examination of the weights and activations
2:30:22 showing what the AI is thinking. In the same way, how in our brains when circuits develop they have
2:30:31 not been shaped to be resistant to some super FMRI being able to infer behavior from them. 2:30:37 Although isn’t the implication of the superposition stuff that in fact it is. Sorry,
2:30:43 this is inside [unclear] for the audience, but basically you can't cleanly infer what quality
2:30:49 a single neuron stands for. This neuron is about Alexander the Great or this neuron
2:30:57 is about my desire to conquer the world. Yes, things can have multiple direction and
2:31:03 interaction. It's not as easy as just, oh, yeah this is the neuron that always fires in cases of
2:31:11 deception. But the thing that makes me relatively optimistic here with respect to interpretability
2:31:20 is that I think we can have reasonably robust experimental feedbacks on a lot of these things.
2:31:30 The kind of setup that I would be interested in is, you have an AI that is trained and rewarded
2:31:44 for communicating to some audience, maybe a human, maybe another AI model,
2:31:50 that some claim is true or false. This can be from some past data, a particular company will succeed
2:32:01 or fail, could be token prediction, it could be this computer program does not have a backdoor,
2:32:11 which if run in the servers could be used for an AI takeover. So There are lots of true and false
2:32:19 claims that you could train AIs to use their best efforts to advocate for. And you can do that on
2:32:26 things where you're later going to have a ground truth. You're going to know in fact, is there
2:32:33 a backdoor in this program because you're training and rewarding the AI for successfully slipping a
2:32:39 backdoor through, you also give it a penalty for being caught. Doing this kind of thing, you can
2:32:47 draw out a large generated data set on demand of — this is what an AI putting forth its best efforts
2:32:58 to pass through some deception to put a backdoor in some program, this is what it looks like.
2:33:07 And we think it's doing its best efforts because we've trained it for that and gradient descent. If there's any immediate way to adjust its performance to do better,
2:33:16 gradient descent will apply that. You can talk about ways in which gradient descent
2:33:24 could be subverted, but I think it's not the most likely case that that really breaks things hard. 2:33:30 Yeah, I guess before we get into the details on this. The thing I'll maybe want to address the
2:33:38 layer above in the stack, which is, okay, suppose this generalizes well into the early AI is the
2:33:45 GPT-6’s. So now we have a kind of aligned GPT-6 that is the precursor to the feedback
2:33:54 loop in which AI is making itself smarter. At some point they're gonna be super intelligent, they're gonna be able to see their own galaxy brain, and if they don't want to
2:34:04 be aligned with the humans they can change it. At this point what do we do with the
2:34:10 aligned GPT-6 so that the super intelligence that we eventually develop is also aligned? 2:34:16 Humans are pretty unreliable. If you get to a situation where you have AIs who are
2:34:24 aiming at roughly the same thing as you, at least as well as having humans do the thing,
2:34:31 you're in pretty good shape. And there are ways for that situation to be relatively stable. We
2:34:42 can look ahead and experimentally see how changes are altering behavior, where each step is a modest
2:34:50 increment. So AIs that have not had that change made to them get to supervise and monitor and see
2:34:58 exactly how does this affect the experimental AI? So if you're sufficiently on track with
2:35:07 earlier systems that are capable cognitively of representing a robust procedure then I think
2:35:16 they can handle the job of incrementally improving the stability of the system so that it rapidly
2:35:23 converges to something that's quite stable. But the question is more about getting to that
2:35:29 point in the first place. And so Eliezer will say that if we had human brain emulations, that would
2:35:36 be pretty good. Certainly much better than his current view that has certainly almost been doom.
2:35:45 We would have a good shot with that. So if we can get to the human-like mind with the
2:35:55 rough enough human supporting aims. Remember that we don't need to be infinitely perfect because
2:36:04 that's a higher standard than brain emulations. There's a lot of noise and variation among humans.
2:36:09 Yeah, it's a relatively finite standard. It's not godly superhuman although
2:36:15 A) AI that was just like a human with all the human advantages with AI advantages as well,
2:36:22 as we said, is enough for intelligence explosion and wild superhuman capability if you crank it up.
2:36:30 And so it's very dangerous to be at that point, but you don't need to be working with a godly
2:36:38 super intelligent AI to make something that is the equivalent of human emulations. This is a
2:36:45 very sober, very ethical human who is committed to a project of not seizing power for themselves
2:36:55 and of contributing to a larger legitimate process. That's a goal you can aim for,
2:37:01 getting an AI that is aimed at doing that and has strong guardrails against the ways it could easily
2:37:08 deviate from that. So things like being averse to deception, being averse to using violence,
2:37:17 and there will always be loopholes and ways in which you can imagine an infinitely intelligent
2:37:23 thing getting around those but if you install additional guardrails like that fast enough,
2:37:31 they can mean that you're able to succeed at the project of making an aligned enough
2:37:38 AI. Certainly an AI that was better than a human brain emulation before the project of
2:37:46 AIs in their spare time or when you're not looking or when you're unable to appropriately supervise them and it gets around any deontological prohibitions they may have,
2:37:57 takeover and overthrow the whole system. So you have a race between on the one hand the project of getting strong interpretability and shaping motivations that are roughly aiming at making this
2:38:09 process go well and that have guardrails that will prevent small deviations from exploding.
2:38:16 And on the other hand, these AIs In their spare time or in ways that you
2:38:23 don't perceive or monitor appropriately or they're only supervised by other AIs who conspire to make the AI takeover happen and I guess we'll talk later about how that happens. 2:38:33 Are these different AIs that are doing the race or is it just different capabilities of the same AI? 2:38:38 Defining what is a separate AI is tricky. We talk about GPT-4 and there are many instances
2:38:47 of GPT-4 on the servers at any given time, and there are versions that have been fine tuned to
2:38:54 different purposes. They don't necessarily have to be separate. GPT-4 does work as a preference model
2:39:03 where it's predicting how humans will evaluate things And it also does work like giving answers
2:39:12 that are evaluated as good by the preference model. These are not separate in the sense that
2:39:23 the early training GPT was harboring a desire to conspire for an AI takeover then who's watching
2:39:32 the watchers? The watchers may also have the same motivation. If you have a setup where humans have
2:39:41 no hard power, all of the safety and security mechanisms involve AIs watching other AIs,
2:39:50 if the AIs conspired and are able to coordinate then they have the potential to just walk off the job at the same time. That's a failure mode. If humans still have the hard power though, if
2:40:06 you still have situations where humans are looking at some of the actual outputs that are produced,
2:40:12 then they can apply gradient descent to be well every time a human looks at it, things
2:40:20 have got to be pretty good. So if humans look at a thousand samples of random lines of code being
2:40:26 written by the AIs as they go and this is drawn from an independent distribution then 99 plus
2:40:35 percent of the time the AIs have to not be doing stuff that’s like downloading the hacker's guide
2:40:42 to escaping OpenAI servers or inducing obvious trojan vulnerabilities and that's actually quite
2:40:52 a different dynamic than human law enforcement. If human law enforcement would catch one in
2:40:58 a million crimes then that wouldn't influence the behavior of people committing crimes at all
2:41:04 because they expect that this is very unlikely to affect them. With AI if you're able to
2:41:12 draw a hundred samples or a thousand samples, even if there are billions of AI instances and you're
2:41:18 only looking at a small portion of them, when you run gradient descent on the samples you derive
2:41:24 you're going to change the AI so that whenever humans look at it, it's delivering a good result.
2:41:32 That's just quantitatively a very different functional form for how law enforcement works
2:41:37 on AI when you can do these independent samples than it would for policing a human population
2:41:46 that has a pre-existing psychology that isn't being changed by this these observations because
2:41:54 when we do gradient descent on the samples we draw, all of the next generation of the
2:42:00 AI models has been altered in that way. My picture of aligned subhuman AI to
2:42:07 the superhuman AI being aligned is still murky. If you can talk about that more concretely. 2:42:13 Eliezer’s claims were something like 95%, 98% plus likely to be killed in an AI takeover.
2:42:30 I think that probably won't happen and later I can maybe give a more exclusive breakdown of why.
2:42:37 But I do want to clarify that I still think it's a shockingly high risk.
2:42:43 Depending on the day I might say one in four or one in five that we get an AI takeover
2:42:51 that seizes control of the future, makes a much worse world than we otherwise would have had
2:43:00 and with a big chance that we're all killed in the process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment