Large AI models’ emissions come from two phases: training (one-time, high cost) and inference (ongoing, now often dominant). Data-centre electricity for AI is growing fast, the IEA estimates AI-related demand could drive a large rise in data-centre consumption through 2030, but the exact carbon cost per query varies widely by model, hardware, location, and grid mix. Practical steps (measure, optimize code & model, choose efficient hardware, shift to low-carbon electricity, and report transparently) can cut emissions substantially.

AI’s Hidden Emissions: How Much Carbon Do Large Models Use?

1. Why this matters now

AI model development and deployment have exploded. Training today’s largest models requires enormous GPU farms and huge electricity inputs; meanwhile, inference like billions of queries to deployed models, can accumulate more energy use over time than the original training run. The result: organizations that build or host AI services are creating a material new source of electricity demand and associated CO₂ emissions, and those emissions concentrate where data centres are energy-intensive.

2. Two phases: training vs inference (and which matters more)

Training a large model (the research & development phase) is energy-intensive and often publicly highlighted because it’s a single measurable event. Some headline examples from earlier model generations ran to many tens of MWh for large experiments. But training is a one-off (per model version) cost.
Inference (serving the model to users) is ongoing. As AI usage scales, inference electricity, repeated millions or billions of queries, frequently becomes the larger lifetime energy cost. Recent assessments and simulators show inference can account for a majority of lifecycle energy in many deployment scenarios. That means everyday use patterns, model efficiency at serving time, and architecture choices are crucial.

(Load-bearing claim #1: inference can account for the majority of a model’s lifecycle energy.)

3. How much energy are we talking about globally?

Data centres today use a nontrivial but still modest share of global electricity — estimates cluster in the low single digits of percent. The IEA’s recent Energy & AI analysis finds data-centre electricity was about 1.5% of global electricity in 2024 (~415 TWh) and projects AI-driven growth could raise data-centre demand substantially by 2030. Projections vary by scenario, but AI is widely expected to be a major driver of near-term growth in data-centre electricity.

(Load-bearing claim #2: data centres ~1.5% of global electricity use in 2024; AI is projected to significantly increase demand.)

4. Why numbers per query or per model are so noisy

Simple headline numbers (e.g., “X watt-hours per query” or “training cost = Y tonnes CO₂”) are tempting but often misleading because the result depends on many variables:

model size and architecture (parameters, attention patterns),
hardware efficiency (GPU generation, utilization, datacenter PUE),
batch size and software stack (how well inference is batched and optimized),
geographic location & grid carbon intensity, and
usage volume (1,000 queries vs 1 billion queries).
Recent re-estimates show earlier per-query numbers have been revised downward as models and runtimes get more efficient, but overall electricity demand can still grow rapidly because user base and model deployment scale exponentially.

5. Measurement tools you can use today (practical first step)

If you run models or cloud workloads, measure first. Several production-ready tools let engineers estimate and track emissions:

ML CO₂ / Machine Learning CO2 Impact Calculator — a simple estimator for training & evaluation runs.
Microsoft Sustainability Calculator (and cloud provider tools) — useful for cloud-hosted workloads to estimate emissions tied to provider regions and instance types.

(Load-bearing claim #3: practical tracking tools like ML CO2 and CodeCarbon exist and are widely used.)

6. Where most emissions come from in practice

Recent critical reviews and technical papers show:

Spatial concentration: a disproportionate share of AI compute is hosted in a few regions and big hyperscale data centres. Grid carbon intensity in those regions determines the immediate emissions impact.
Lifecycle effects: beyond electricity, building new data centres and cooling infrastructure has embodied carbon and water impacts; lifecycle assessments help reveal these tradeoffs.

7. Reasonable rules of thumb (with care)

Because of the variation, avoid absolute claims. Instead use ranges and scenario thinking:

Training a large model can require tens to hundreds of MWh (or more for the largest research runs), depending on project scope and repeated experiments. Inference per query is often measured in fractions of a watt-hour, but multiplied at scale these fractions add up quickly. Recent measured estimates for typical modern LLM queries have been revised down to the 0.x watt-hour per query range in many cases, yet if a service handles billions of queries, that multiplies into large totals. Always include the assumptions (model, hardware, region).

(Load-bearing claim #4: per-query energy estimates for modern LLMs may be on order 0.x Wh, but scale matters.)

8. Best practices to reduce model emissions (practical checklist)

Measure and report — instrument experiments and production with CodeCarbon/ML CO2 and publish methodology. Transparent measurement is the foundation of reduction.
Optimize model & code — smaller, distilled models, quantization, pruning, and efficient architectures can reduce inference cost dramatically. Use batching and faster kernels.
Choose efficient hardware & utilization — newer GPU generations (and purpose-built accelerators) are far more energy-efficient per FLOP. Maximize utilization to avoid idle power waste.
Prefer low-carbon grids or contracted renewables — colocate heavy workloads in regions with clean grids or use supplier green power contracts; but beware of double counting (claims must be verifiable).
Lifecycle thinking — evaluate embodied carbon for new datacentres and cooling choices; reuse & refurbishment where possible.
Carbon-aware scheduling — defer non-urgent workloads to times of higher renewable availability when possible.

9. What organisations should publish (transparency & credibility)

Publish documented measurement methods (which tool, which region grid factors, assumptions), publish per-run or per-model energy data where possible, and describe mitigation steps. This helps head off greenwashing critiques and allows reproducibility. The IEA and independent reviews urge improved transparency across the industry.

10. Policy and system-level context

System-level planning matters: the IEA models multiple scenarios where AI increases electricity demand, but the net climate impact depends on the speed of grid decarbonization, where data-centre expansion occurs, and whether AI adoption enables emissions reductions in other sectors (e.g., efficiency gains through optimized logistics). Policymakers should integrate data-centre planning with grid expansion and regional energy strategy.

(Load-bearing claim #5: the system-level climate impact of AI depends on grid decarbonization and whether AI yields emission reductions elsewhere.)

11. A practical worked example (how to report a model)

Record training run GPU hours, GPU model, PUE, and data-centre region.
Use CodeCarbon or ML CO₂ to estimate kWh and convert to CO₂ using hourly grid intensity or provider estimates.
For inference, measure average kWh per query (instrumented), multiply by expected total queries in a forecast period, and report both training and projected inference emissions.
Publish the calculator inputs and a sensitivity table (e.g., +/- grid intensity, utilization). This transparency makes the number actionable and comparable.

12. Takeaway — what readers should do tomorrow

If you’re an engineer or researcher: add CodeCarbon/ML CO₂ to your experiments this week, measure both training and inference, and produce a short public methodology note.
If you’re a manager: require an emissions estimate for new models before approval, prefer efficient hardware, and include carbon reduction targets in procurement.
If you’re a policymaker or city planner: integrate data-centre siting with clean-energy deployment, and require transparency on power sourcing and carbon accounting.

FAQ

Q1: Which is worse — training or inference?
A: It depends. Training is a high one-time energy cost; inference is continuous and can dominate lifetime emissions as usage scales. Measure both.

Q2: Can companies make AI zero-carbon?
A: Companies can reduce AI emissions dramatically through efficiency and procuring clean electricity, but “zero” requires credible renewable contracts and sometimes carbon removal for residuals. Transparent reporting is essential.

Q3: What tool should I use to measure emissions?
A: CodeCarbon and ML CO₂ are good starting points for research; cloud providers (Microsoft, AWS, Google) offer additional estimators for cloud workloads — use them and publish your assumptions.

Sources: International Energy Agency — Energy & AI (2025). International Energy Agency

Nature: “How much energy will AI really consume?” (Mar 2025). Nature

ML CO₂ Impact Calculator. mlco2.github.io

CodeCarbon project.