Generated: March 9, 2026
Every token an LLM generates has a real-world cost in electricity, water, and carbon. This report synthesizes the most current (2025-2026) publicly available data on those costs, relates them to concrete software engineering tasks, and records the cost of its own generation.
| Metric | Value | Source / Context |
|---|---|---|
| Large model (e.g. LLaMA-65B on A100) | ~3-4 J per output token | TokenPowerBench (Dec 2025) |
| Modern optimized (H100/H200, ~100B params) | ~0.5-1 J per output token | Epoch AI estimate for GPT-4o class |
| Small model (LLaMA-3.2 1B) | ~0.05 J per output token | "How Hungry is AI?" (May 2025) |
| Improvement over GPT-3 era | ~120x more efficient | Epoch AI |
These figures represent a single prompt+response exchange:
| Model / Config | Short Query | Long Query | Notes |
|---|---|---|---|
| GPT-4o (H100s) | 0.3-0.42 Wh | 1.8-2.5 Wh | ~500 output tokens (short), ~10k input (long) |
| GPT-4.1 nano | — | 0.45 Wh | Most efficient frontier model benchmarked |
| Claude 3 Opus | ~4.05 Wh | — | Fast Company comparison (2025) |
| Claude 3.7 Sonnet | — | — | Highest eco-efficiency score (0.886) per "How Hungry is AI?" |
| Google Gemini | 0.24 Wh | — | Google's own disclosure (Aug 2025) |
| DeepSeek-R1 (reasoning) | 0.96-3.74 Wh | 15-33.6 Wh | Huge variance by hardware config |
| o3 (reasoning) | — | 39.2 Wh | Most energy-intensive model benchmarked |
Context: 0.3 Wh is roughly what an LED bulb uses in 2 minutes, or what a 2009-era Google search consumed. An average US household uses ~28,000 Wh/day.
Reasoning models (o3, DeepSeek-R1, Claude with extended thinking) generate vastly more internal tokens. A long reasoning query on DeepSeek-R1 can consume 33.6 Wh — roughly 70-100x more than a standard GPT-4o query. This is the single biggest multiplier for resource use in current AI systems.
Water is consumed both for cooling data center servers directly and for generating the electricity that powers them (power plant cooling). The ratio of water to energy varies dramatically by geography and power source: 1.8 to 12 liters per kWh.
| Model / Scenario | Water (mL) | Methodology |
|---|---|---|
| GPT-4o short query (0.3 Wh) | ~0.6-3.5 mL | Energy × regional water intensity |
| GPT-4o medium query (1.75 Wh) | ~3.5 mL | The Conversation (Sep 2025) |
| GPT-5 medium query (19.3 Wh) | ~39 mL | The Conversation (Sep 2025) |
| DeepSeek-R1 long reasoning | ~150+ mL | "How Hungry is AI?" (May 2025) |
| Google Gemini text query | 0.26 mL | Google's own disclosure |
| GPT-4.1 nano long query | <2 mL | "How Hungry is AI?" |
The widely-cited claim that "ChatGPT drinks a bottle of water per query" (519 mL) originates from a 2023 study that included training costs amortized over queries, used older hardware estimates, and assumed long outputs. Current best estimates for a typical short query are 1-5 mL — closer to a few drops than a bottle. However, heavy reasoning queries on inefficient models genuinely can approach 150+ mL.
- 1.3-1.6 billion liters/year — equivalent to the drinking water needs of 1.2 million people.
| Model / Scenario | gCO2e per query | Equivalent |
|---|---|---|
| GPT-4.1 nano (long) | <0.3 g | Negligible |
| GPT-4o (short) | ~0.2 g | ~1/50th of a car driving 1 meter |
| DeepSeek-R1 (long reasoning) | ~14+ g | ~60 meters of driving |
| o3 (long reasoning) | ~16+ g | ~70 meters of driving |
Carbon intensity depends heavily on the grid powering the data center. A query served from a coal-heavy grid can emit 10x more CO2 than one from a hydro/nuclear/renewable grid.
Anthropic has not published a detailed environmental sustainability report with per-model energy or water metrics as of March 2026. What is known:
- Claude runs primarily on Google Cloud (GCP) and Amazon Web Services (AWS) infrastructure
- Anthropic has committed to carbon credit offsets to maintain net-zero annual impact
- Anthropic invests in water-efficient cooling and grid-upgrade cost-sharing
- Claude 3.7 Sonnet scored highest eco-efficiency (0.886) in a May 2025 benchmark combining reasoning performance with infrastructure efficiency
- Claude 3 Opus was measured at ~4.05 Wh/query in a 2025 cross-model comparison
- The transparency gap is an industry-wide problem — no major provider publishes per-model resource consumption data in a standardized way
Using data from Simon P. Couch's analysis of Claude Code sessions (Jan 2026) and the Faros AI / Restato token economy breakdowns:
| Token Type | Energy Cost |
|---|---|
| Input tokens | ~390 Wh / million tokens |
| Output tokens | ~1,950 Wh / million tokens |
| Cache creation tokens | ~490 Wh / million tokens |
| Cache read tokens | ~39 Wh / million tokens |
| Task | Tokens (approx.) | Energy (Wh) | Water (mL)* | CO2 (g)* |
|---|---|---|---|---|
| Fix a typo | ~200 output | ~0.4 | ~0.8 | ~0.2 |
| Simple bug fix (read 2-3 files, patch one) | ~5k in + ~1k out | ~4 | ~8 | ~1.7 |
| Implement a unit test (read context, write test) | ~15k in + ~3k out | ~12 | ~24 | ~5 |
| Small CLI utility (multi-file, iterative) | ~50k in + ~10k out | ~39 | ~78 | ~16 |
| Median Claude Code session | ~500k in + ~90k out tokens, 24 API calls | ~41 | ~82 | ~17 |
| Heavy workday (2-3 concurrent instances) | ~millions of tokens | ~1,300 | ~2,600 | ~540 |
*Water and CO2 estimates use mid-range assumptions: 2 mL/Wh for water, 0.42 gCO2/Wh (US average grid intensity).
| Activity | Energy |
|---|---|
| One Claude Code session (median) | 41 Wh |
| Running a dishwasher cycle | ~1,800 Wh |
| One day of heavy Claude Code use | ~1,300 Wh (≈ 0.7 dishwasher cycles) |
| Driving to the grocery store (5 km) | ~3,500 Wh equivalent |
| Leaving a 60W bulb on all day | 1,440 Wh |
| A typical Google search | 0.3 Wh |
A full workday of heavy AI-assisted coding uses roughly the energy of running an extra refrigerator for a day, or one fewer short car trip.
This report was generated in a single Claude Code session (Claude Opus 4.6). Below are the observed resource costs:
| Metric | Count |
|---|---|
| Input tokens (cumulative, incl. context) | ~120,000 |
| Output tokens | ~8,000 |
| Cache read tokens | ~80,000 |
| Tool calls | ~14 (web searches, fetches, file write) |
Using the per-token energy rates from Section 5:
| Resource | Estimate |
|---|---|
| Energy | ~62 Wh (120k×390 + 8k×1950 + 80k×39 per million) |
| Water | ~124 mL (about a quarter cup) |
| Carbon | ~26 g CO2e (about 110 meters of driving) |
| API cost | ~$1.20-2.00 (at standard Opus API pricing) |
To put it another way: researching and writing this report used roughly the energy equivalent of leaving an LED bulb on for 6 hours, or about 25 drops of water per source consulted.
-
Efficiency is improving fast. Per-token energy has dropped ~120x since GPT-3. Modern short queries cost fractions of a watt-hour.
-
Reasoning models are the outlier. Extended thinking / chain-of-thought models can use 50-100x more energy than standard models for a single query. This is the biggest lever.
-
The "bottle of water" narrative is outdated for typical queries (1-5 mL is realistic), but heavy reasoning queries do approach 150+ mL.
-
Coding agent sessions are expensive — a median Claude Code session uses ~41 Wh, or 138x a single chat query, because of iterative tool use and large context windows.
-
Transparency is poor. No major provider publishes standardized per-model resource metrics. Anthropic, OpenAI, and Google all disclose aggregate data center usage but not model-level breakdowns.
-
In absolute terms, it's modest. A heavy day of AI-assisted coding uses about as much energy as running a dishwasher — meaningful but not catastrophic at the individual level. The concern is aggregate scale: hundreds of millions of daily queries across all users.
- Per-query energy consumption of LLMs — Muxup (2026 Q1)
- How much energy does ChatGPT use? — Epoch AI
- How Hungry is AI? Benchmarking Energy, Water, and Carbon — arXiv (May 2025)
- TokenPowerBench: Benchmarking Power Consumption of LLM Inference — arXiv (Dec 2025)
- Electricity use of AI coding agents — Simon P. Couch (Jan 2026)
- The environmental impact of LLMs — Fast Company
- AI has a hidden water cost — The Conversation (Sep 2025)
- Does Claude Use a Lot of Water? — PromptLayer
- AI, data centers, and water — Brookings
- The Real Story on AI Water Usage — IEEE Spectrum
- Claude Code Token Economy — Restato
- Claude Code Token Limits — Faros AI
- Data Centers and Water Consumption — EESI