LLM API (Beta)

Estimate CO2e emissions from large language model inference based on token usage, model architecture, and regional grid intensity.

<div class="admonition admonition-warning"><span class="admonition-icon">⚠️</span><div class="admonition-content"><p>Beta Endpoint This endpoint is in beta. Energy consumption estimates are based on published research and may not reflect exact provider infrastructure. Response format and model coverage may change. We welcome feedback at <a href="mailto:[email protected]">[email protected]</a>.</p> </div></div>

Overview

The LLM API estimates the carbon footprint of AI inference by modelling energy consumption from token processing and mapping it to regional grid carbon intensity. It covers 80+ models from 12 providers, with automatic reasoning model detection and prompt caching support.

Key Features

80+ models classified across 4 energy tiers (frontier, large, medium, small)
12 providers including OpenAI, Anthropic, Google, Meta, Mistral, DeepSeek, xAI
Reasoning model detection with automatic thinking token overhead (o3, DeepSeek R1, Gemini Deep Think)
Prompt caching support — 90% energy reduction on cached input tokens
Cloud provider regions — pass aws_bedrock, gcp_vertex, or azure_openai for provider-specific grid intensity
Embodied carbon — includes amortized hardware manufacturing emissions
Unknown model estimation — unrecognised models are classified by name pattern matching

Endpoints

Endpoint	Description
Calculate Emissions	Estimate CO₂e for LLM inference
Supported Models	List all known models with tiers and providers

Quick Example

curl "https://api.emissions.dev/v1/digital/llm/emissions?\
provider=openai&\
model=gpt-5.2&\
tokens_input=5000&\
tokens_output=2000" \
  -H "Authorization: Bearer em_live_xxxx"

{
  "data": {
    "type": "llm_emission",
    "attributes": {
      "emissions": {
        "co2e": 1.1961,
        "co2e_unit": "g",
        "breakdown": {
          "operational_co2e": 0.8961,
          "embodied_co2e": 0.3,
          "unit": "g"
        }
      },
      "inference": {
        "provider": "openai",
        "model": "gpt-5.2",
        "tier": "frontier",
        "is_known_model": true
      }
    }
  }
}

How It Works

The calculation has three components:

Component	Description	GHG Protocol Scope
Token energy	Energy consumed processing input and output tokens	Scope 2
Base energy	Fixed overhead per request (model loading, routing)	Scope 2
Embodied carbon	Amortized GPU hardware manufacturing	Scope 3 Category 1

Formula:

Total CO₂e = (Base Energy + Token Energy) × PUE × Grid Intensity + Embodied Carbon

Model Tiers

Models are classified into energy tiers based on architecture size and computational requirements:

Tier	Energy Profile	Example Models
Frontier	Highest — massive parameter count, multi-GPU	GPT-5.2, Claude Opus 4.5, Gemini 3 Pro, o3, Llama 4 Behemoth
Large	High — standard large models	GPT-4.1, Claude Sonnet 4, Gemini 2.5 Flash, Llama 3.3 70B
Medium	Moderate — optimised for efficiency	GPT-4.1-mini, Claude Haiku 4.5, Gemini 2.0 Flash, Llama 4 Scout
Small	Lowest — lightweight models	GPT-5-nano, Claude 3 Haiku, Gemini Nano, Llama 3.1 8B

Unknown models are automatically classified by pattern matching on the model name (e.g. names containing "opus" or "405b" → frontier, "mini" or "flash" → medium).

Reasoning Models

Reasoning models generate internal "thinking" tokens that consume energy but aren't visible in the output. The API automatically detects these models and applies a thinking ratio multiplier:

Model	Default Thinking Ratio	Effect
o3	6×	1,000 output tokens → 6,000 effective tokens
o3-mini	4×	Lower overhead than full o3
o1-pro	8×	Highest overhead
DeepSeek R1	6×	Similar to o3
Gemini Deep Think	6×	Google's reasoning variant

Override with the reasoning_effort parameter: none (1×), low (2×), medium (4×), high (6×), xhigh (10×).

Grid Intensity Resolution

Priority	Condition	Source	Example
1	`region` parameter provided	Ember / EPA eGRID	`region=FR` → 42 gCO₂e/kWh
2	Cloud provider (`aws_bedrock`, `gcp_vertex`, `azure_openai`)	Provider region average	Average across all regions
3	Provider default	Country of provider HQ	Mistral → France (42), DeepSeek → China (544)

Data Sources

Source	Coverage	Used For
Epoch AI — How Much Energy Does ChatGPT Use (2025)	Energy per token estimates	Token energy model
Google Gemini Environmental Report (2025)	Google model efficiency	Provider efficiency factors
SemiAnalysis — AI Datacenter Energy (2024)	GPU power consumption	Base energy and tier factors
Cloud Carbon Footprint	Hardware lifecycle	Embodied carbon per request
Ember / EPA eGRID / Electricity Maps	Grid carbon intensity	CO₂e conversion

Limitations

This endpoint provides estimates, not measured values. Key limitations:

Energy-per-token factors are derived from published research, not direct measurement from provider infrastructure
Provider efficiency multipliers are approximations — actual PUE and hardware vary by data centre
Embodied carbon is amortized across estimated total requests per GPU lifecycle
Reasoning token estimates use a fixed multiplier rather than actual thinking token counts
Cached token energy reduction (90%) is an approximation of KV-cache efficiency