Key takeaways
- Training large AI models consumes substantial energy — frontier training runs use megawatts of power for weeks or months.
- IEA projections estimate data-center electricity consumption could roughly double globally from 2022 to 2026 driven in part by AI workloads.
- Inference — the aggregate cost of serving queries to billions of users — now exceeds training energy for deployed production models.
- Water use for data-center cooling has become a local environmental concern in several regions.
- Mitigations include efficient architectures, renewable energy siting, workload scheduling, and model compression.
Why AI uses so much energy
Modern AI models are computationally heavy in two phases. Training runs each large forward-backward pass many times on GPU clusters; a frontier training run can involve thousands of high-end GPUs operating for weeks. Inference serves the trained model to end users — each query consumes a smaller amount of energy, but at internet scale, billions of queries add up.

A widely-cited 2019 paper by Strubell et al. estimated a single BERT-base training run consumed roughly the carbon equivalent of a trans-American flight — controversial for specific numbers but influential in surfacing the environmental question. Since 2019, models have grown by orders of magnitude. Training a frontier model today consumes energy roughly equivalent to the annual power use of thousands of households. For underlying training mechanics, see our model training guide.
Putting numbers in context
Energy-use figures for AI are often quoted imprecisely. A realistic framing requires distinguishing several categories.
Training a frontier model
Estimates for GPT-4 and comparable frontier models range from tens to hundreds of megawatt-hours per training run. The Stanford AI Index 2025 estimates training compute for frontier models has grown roughly 5x annually since 2010, a faster growth rate than Moore’s Law. Each generation costs more; efficiency gains per training watt have trailed the absolute growth.
Training a smaller model
Fine-tuning a 7B-parameter model with parameter-efficient methods (LoRA) consumes on the order of tens of kilowatt-hours — comparable to running a typical household for a few days. Most production model work happens in this regime, not frontier pre-training.
Inference cost per query
A single GPT-4-class inference query is estimated at roughly 0.3-3 watt-hours, depending on model and query complexity. Orders of magnitude less than training per query, but deployed across billions of daily queries adds up to substantial aggregate consumption.
Aggregate inference vs. training
For a widely-deployed model like ChatGPT, the aggregate inference energy over its lifetime significantly exceeds the one-time training cost. This is a meaningful shift — early AI-environmental discussion focused heavily on training, but inference is now the larger long-term footprint for popular models. For the broader industry context, see our ai industry coverage.
Data center energy and water
AI runs in data centers. Global data-center electricity consumption was estimated at 460 TWh in 2022 (roughly 2% of global electricity) and is projected by the IEA to reach 700-1,000 TWh by 2026. AI workloads are a substantial driver of this growth, alongside cloud services, streaming, and cryptocurrency.
Modern data centers use water for evaporative cooling — a major operational advantage over air-cooling alone. Several reports have highlighted locally significant water use in drought-prone areas. Microsoft, Google, and Meta all publish sustainability reports documenting data-center water consumption and efficiency (PUE and WUE metrics), though direct attribution to specific AI workloads is difficult.
Where the energy actually comes from
Power mix matters as much as power quantity. A data center on a 100% renewable grid produces less CO2 per kWh than one on a coal-heavy grid. Hyperscalers have committed to ambitious renewable targets — Google since 2017 claims to match annual electricity use with renewable purchases; Microsoft has a 2030 carbon-negative goal; Amazon is investing heavily in renewable procurement. Real-time matching (24/7 carbon-free) is harder than annual matching and less widely achieved.
Regional siting matters. Training runs scheduled on grids with high renewable penetration at off-peak times (Iceland geothermal, Pacific Northwest hydro, Finnish wind) reduce carbon intensity. Some AI labs deliberately schedule long training runs around renewable availability.
Mitigations that are working
Architecture efficiency
Mixture-of-experts models activate only a subset of parameters per token, dramatically reducing inference compute versus equivalent-quality dense models. Smaller specialized models distilled from larger ones reduce inference energy per query. Flash Attention, continuous batching, and speculative decoding have cut serving costs by 2-10x for LLMs deployed in 2024-2025.
Hardware efficiency
Newer GPUs and AI accelerators (NVIDIA Hopper, Blackwell; Google TPU v5/v6; custom silicon at Apple, Meta, Amazon, Microsoft) deliver more computation per watt than their predecessors. Energy-per-token for inference has dropped measurably across model generations — the same capability that cost X watts two years ago now costs a fraction.
Carbon-aware scheduling
Large cloud providers increasingly offer tools and APIs for scheduling compute jobs when and where grid carbon intensity is lowest. Google Cloud’s carbon-intelligent compute, Microsoft’s emission-aware Azure features, and carbon-aware CI/CD tools (WattTime, ElectricityMap integration) are maturing.
Smaller open models
High-quality open-weights models (Llama, Mistral, Qwen, Phi) let teams serve inference locally on smaller hardware rather than routing every query to frontier API endpoints. For many tasks, a small on-device model handles the workload at a fraction of the energy. See our large language models coverage for the model ecosystem.
What doesn’t help
Several claims about AI sustainability are weaker than they sound. Offsets are controversial — many voluntary carbon offsets have been found ineffective or fraudulent, and claiming “net zero” via offsets does not reduce actual emissions. “AI will solve climate” arguments are aspirational and do not offset the immediate footprint. Moving to “the cloud” may or may not reduce emissions; it depends on the cloud provider’s energy sourcing.
Ongoing debates
How to measure AI’s environmental impact honestly remains contested. Absolute emissions from training are small compared to global energy use; relative growth rate is concerning. Measuring per-query emissions depends on how much of the data-center overhead is attributed to the query. Honest reporting — following frameworks like GHG Protocol Scope 1/2/3 — helps but is uneven across vendors. Several research groups and NGOs (Carbon Trust, Green Software Foundation, ML CO2 Impact Calculator) work on standardizing AI emissions accounting.
What developers can do
- Measure — use ML CO2 calculators, cloud provider carbon dashboards, or the CodeCarbon library to track per-run emissions.
- Use smaller models when accuracy is sufficient. Do not default to frontier LLMs for tasks a distilled 7B model handles well.
- Cache aggressively. Repeated computation is wasted energy.
- Schedule non-urgent work in regions and times with cleaner grids.
- Track model lifetime emissions, not just training — inference costs compound.
- Prefer efficient architectures (MoE, quantization) when they meet quality requirements.
Frequently asked questions
Is asking ChatGPT a question really bad for the environment?
A single query has a small footprint — on the order of a watt-hour or less, roughly comparable to running an LED bulb for a few minutes. The environmental concern is aggregate — billions of queries daily, multiplied across multiple AI services, add up to meaningful consumption. Individual usage is not the primary lever; data-center energy sourcing and model efficiency are.
Are smaller models more sustainable than big ones?
Generally yes, per query. A well-tuned small model serving 90% of queries at 10% of the energy is an obvious win. The practical question is matching model size to task difficulty. Frontier models are justified for genuinely hard tasks; using them for tasks a smaller model handles well is wasteful. Routing frameworks that send easy queries to small models and hard queries to frontier models are an active engineering pattern.
Will AI data centers cause grid problems?
Possibly in some regions. Multiple US states and European countries have reported grid-capacity concerns driven by AI-related data-center growth. Utility long-term planning is adjusting. Renewable procurement, battery storage, and demand-response programs are being scaled up. Whether grid capacity will keep pace with AI growth is a live question. Watch utility and regulatory announcements in the regions where AI compute is concentrating.






