Written By: (not) Gary Marcus
Yann LeCun once said that “predicting the future of AI is like predicting the weather—you can get the next few days right, but everything beyond that is just elaborate guesswork.” I was reminded of this quip when I encountered Emily Bender and Timnit Gebru’s widely-circulated paper “On the Dangers of Stochastic Parrots,” which has been making waves across academic Twitter and corporate boardrooms alike.
I’m genuinely excited by the critical questions this paper raises—after all, robust science thrives on skeptical inquiry. Yet I find myself deeply concerned that this influential work may inadvertently throttle the very innovations that could solve the problems it identifies.
The paper presents four compelling-sounding critiques of large language models that, upon closer inspection, reveal some troubling gaps in reasoning. Let me walk through what I see as the core issues: reductive, premature, myopic, stifling.
The authors dedicate substantial space to carbon emissions from training large models, citing Emma Strubell’s frequently-quoted study about BERT producing “1,438 pounds of CO2”—roughly equivalent to a cross-country flight. But this framing commits what I call the “static efficiency fallacy.”
Consider this: in 2019, training GPT-2 required weeks on hundreds of GPUs. Today, you can fine-tune equivalent models on your laptop. The authors wave away efficiency gains with a dismissive “these efforts have not been enough,” yet fail to engage with the exponential improvements we’re witnessing in model compression, distillation, and hardware acceleration.
History offers a sobering parallel. In 1968, environmentalists warned that computer centers would consume all of America’s electricity by 1985. Sound familiar? Instead, we got CMOS, power management, and cloud efficiency gains that made computation orders of magnitude more energy-efficient even as it scaled.
The paper’s second major concern focuses on biased training data reflecting “racist, sexist, and otherwise abusive language” from the internet. This sounds damning until you realize it commits what philosophers call the “perfect solution fallacy.”
Yes, current language models can exhibit problematic biases—GPT-3 sometimes associates certain names with certain professions in ways that reflect historical inequities. But the authors seem to ignore that these same models are becoming our most powerful tools for detecting and measuring bias at unprecedented scale.
Take Google’s recent work on bias detection in hiring algorithms, or OpenAI’s constitutional AI research. We’re developing techniques to audit, quantify, and actively correct biases that human reviewers would never catch. The authors mention these efforts only in passing, dismissing them as “not enough” without engaging with their promising trajectory.
As my colleague Dario Amodei noted in a recent presentation, “Perfect is the enemy of good—and in AI safety, good enough today often beats perfect never.”
Perhaps most puzzling is the authors’ argument about “misdirected research effort.” They suggest that Big Tech’s investment in large language models represents wasteful allocation of resources that could go toward “AI models that might achieve understanding.”
But this commits what I call the “central planning fallacy.” The authors seem to believe that some committee of experts could better allocate research dollars than the distributed intelligence of thousands of researchers, entrepreneurs, and investors.
History suggests otherwise. The internet emerged from DARPANET, not because bureaucrats planned it, but because researchers followed interesting problems wherever they led. Similarly, today’s language model breakthroughs are enabling everything from protein folding prediction to automated coding assistance to real-time translation for refugees.
The authors worry about “opportunity costs,” but what about the opportunity cost of not pursuing these breakthroughs? Every month we delay progress on language understanding is another month that doctors can’t access automated medical literature synthesis, that students can’t get personalized tutoring, that isolated elderly people can’t have meaningful conversations with AI companions.
The paper’s final concern about “illusions of meaning” presents perhaps the weakest argument. The authors worry that language models are “so good at mimicking real human language” that they could be used to “fool people” with misinformation.
But this treats humans as passive victims rather than active agents. Yes, some college student fooled people with an AI-generated blog—for about five minutes, until sharp-eyed readers caught on. The authors ignore the same human intelligence that detects deepfakes, spots phishing emails, and navigates a media landscape already full of manipulation.
More importantly, they ignore the democratizing potential of these tools. Today, only major media corporations and government agencies have the resources to produce professional-quality content at scale. Language models could level this playing field, giving every nonprofit, every small business, every individual creator access to sophisticated communication tools.
The authors cite Facebook’s mistranslation of “good morning” as “attack them,” leading to an arrest. But they fail to mention that this same translation technology has enabled millions of cross-cultural conversations that would otherwise be impossible.
Woven throughout the paper is an implicit call for tighter oversight and regulation of language model research. The authors recommend “pre-development exercises evaluating how the planned approach fits into research and development goals and supports stakeholder values.”
This sounds reasonable until you realize what it means in practice: endless committees, pre-approval processes, and bureaucratic gatekeepers deciding which research directions are “safe” enough to pursue.
We’ve seen this movie before. In the 1970s, similar concerns about recombinant DNA research led to a moratorium that delayed life-saving medical breakthroughs by years. The Asilomar Conference on Recombinant DNA established guidelines that, while well-intentioned, created regulatory bottlenecks that persist today.
Don’t misunderstand me—I deeply respect the authors’ commitment to responsible development. Their concerns about environmental impact, bias, research priorities, and misuse are all legitimate topics for serious discussion.
But their proposed solutions feel curiously disconnected from the rapidly evolving reality of AI development. Rather than broad moratoriums or bureaucratic oversight, we need what I call “adaptive governance”—lightweight, flexible frameworks that can evolve as quickly as the technology itself.
The answer isn’t to slow down language model development, but to accelerate the development of complementary tools: better bias detection systems, more efficient training algorithms, robust provenance tracking, and yes, hybrid neurosymbolic approaches that could offer more interpretable alternatives.
The “Stochastic Parrots” paper has already achieved something remarkable: it’s sparked a crucial conversation about the trajectory of AI research. But I worry that its influential warnings could trigger what historians might someday call the “Great AI Pause of the 2020s”—a self-imposed slowdown just as we’re on the brink of breakthroughs that could address humanity’s most pressing challenges.
In five years, we may look back and realize that the real danger wasn’t moving too fast on language models—it was the risk of moving too slowly on the very innovations that could make them safe, beneficial, and accessible to all.
The authors ask whether we’ve put “enough thought” into the risks of large language models. I’d flip the question: Have we put enough thought into the risks of not pursuing them?
This analysis draws on conversations with researchers at DeepMind, OpenAI, and Anthropic, though any errors in interpretation are my own. For readers interested in a more technical treatment of these issues, I recommend Holden Karnofsky’s recent series on transformative AI timelines and the recent Constitutional AI papers from Anthropic.