LLM API (Beta)

Estimate CO2e emissions from large language model inference based on token usage, model architecture, and regional grid intensity.

<div class="admonition admonition-warning"><span class="admonition-icon">⚠️</span><div class="admonition-content"><p>Beta Endpoint This endpoint is in beta. Energy consumption estimates are based on published research and may not reflect exact provider infrastructure. Response format and model coverage may change. We welcome feedback at <a href="mailto:[email protected]">[email protected]</a>.</p> </div></div>

Overview

The LLM API estimates the carbon footprint of AI inference by modelling energy consumption from token processing and mapping it to regional grid carbon intensity. It covers 80+ models from 12 providers, with automatic reasoning model detection and prompt caching support.

Key Features

  • 80+ models classified across 4 energy tiers (frontier, large, medium, small)
  • 12 providers including OpenAI, Anthropic, Google, Meta, Mistral, DeepSeek, xAI
  • Reasoning model detection with automatic thinking token overhead (o3, DeepSeek R1, Gemini Deep Think)
  • Prompt caching support — 90% energy reduction on cached input tokens
  • Cloud provider regions — pass aws_bedrock, gcp_vertex, or azure_openai for provider-specific grid intensity
  • Embodied carbon — includes amortized hardware manufacturing emissions
  • Unknown model estimation — unrecognised models are classified by name pattern matching

Endpoints

Endpoint Description
Calculate Emissions Estimate CO₂e for LLM inference
Supported Models List all known models with tiers and providers

Quick Example

curl "https://api.emissions.dev/v1/digital/llm/emissions?\
provider=openai&\
model=gpt-5.2&\
tokens_input=5000&\
tokens_output=2000" \
  -H "Authorization: Bearer em_live_xxxx"
{
  "data": {
    "type": "llm_emission",
    "attributes": {
      "emissions": {
        "co2e": 1.1961,
        "co2e_unit": "g",
        "breakdown": {
          "operational_co2e": 0.8961,
          "embodied_co2e": 0.3,
          "unit": "g"
        }
      },
      "inference": {
        "provider": "openai",
        "model": "gpt-5.2",
        "tier": "frontier",
        "is_known_model": true
      }
    }
  }
}

How It Works

The calculation has three components:

Component Description GHG Protocol Scope
Token energy Energy consumed processing input and output tokens Scope 2
Base energy Fixed overhead per request (model loading, routing) Scope 2
Embodied carbon Amortized GPU hardware manufacturing Scope 3 Category 1

Formula:

Total CO₂e = (Base Energy + Token Energy) × PUE × Grid Intensity + Embodied Carbon

Model Tiers

Models are classified into energy tiers based on architecture size and computational requirements:

Tier Energy Profile Example Models
Frontier Highest — massive parameter count, multi-GPU GPT-5.2, Claude Opus 4.5, Gemini 3 Pro, o3, Llama 4 Behemoth
Large High — standard large models GPT-4.1, Claude Sonnet 4, Gemini 2.5 Flash, Llama 3.3 70B
Medium Moderate — optimised for efficiency GPT-4.1-mini, Claude Haiku 4.5, Gemini 2.0 Flash, Llama 4 Scout
Small Lowest — lightweight models GPT-5-nano, Claude 3 Haiku, Gemini Nano, Llama 3.1 8B

Unknown models are automatically classified by pattern matching on the model name (e.g. names containing "opus" or "405b" → frontier, "mini" or "flash" → medium).

Reasoning Models

Reasoning models generate internal "thinking" tokens that consume energy but aren't visible in the output. The API automatically detects these models and applies a thinking ratio multiplier:

Model Default Thinking Ratio Effect
o3 1,000 output tokens → 6,000 effective tokens
o3-mini Lower overhead than full o3
o1-pro Highest overhead
DeepSeek R1 Similar to o3
Gemini Deep Think Google's reasoning variant

Override with the reasoning_effort parameter: none (1×), low (2×), medium (4×), high (6×), xhigh (10×).

Grid Intensity Resolution

Priority Condition Source Example
1 region parameter provided Ember / EPA eGRID region=FR → 42 gCO₂e/kWh
2 Cloud provider (aws_bedrock, gcp_vertex, azure_openai) Provider region average Average across all regions
3 Provider default Country of provider HQ Mistral → France (42), DeepSeek → China (544)

Data Sources

Source Coverage Used For
Epoch AI — How Much Energy Does ChatGPT Use (2025) Energy per token estimates Token energy model
Google Gemini Environmental Report (2025) Google model efficiency Provider efficiency factors
SemiAnalysis — AI Datacenter Energy (2024) GPU power consumption Base energy and tier factors
Cloud Carbon Footprint Hardware lifecycle Embodied carbon per request
Ember / EPA eGRID / Electricity Maps Grid carbon intensity CO₂e conversion

Limitations

This endpoint provides estimates, not measured values. Key limitations:

  • Energy-per-token factors are derived from published research, not direct measurement from provider infrastructure
  • Provider efficiency multipliers are approximations — actual PUE and hardware vary by data centre
  • Embodied carbon is amortized across estimated total requests per GPU lifecycle
  • Reasoning token estimates use a fixed multiplier rather than actual thinking token counts
  • Cached token energy reduction (90%) is an approximation of KV-cache efficiency