LLM API (Beta)
Estimate CO2e emissions from large language model inference based on token usage, model architecture, and regional grid intensity.
<div class="admonition admonition-warning"><span class="admonition-icon">⚠️</span><div class="admonition-content"><p>Beta Endpoint This endpoint is in beta. Energy consumption estimates are based on published research and may not reflect exact provider infrastructure. Response format and model coverage may change. We welcome feedback at <a href="mailto:[email protected]">[email protected]</a>.</p> </div></div>Overview
The LLM API estimates the carbon footprint of AI inference by modelling energy consumption from token processing and mapping it to regional grid carbon intensity. It covers 80+ models from 12 providers, with automatic reasoning model detection and prompt caching support.
Key Features
- 80+ models classified across 4 energy tiers (frontier, large, medium, small)
- 12 providers including OpenAI, Anthropic, Google, Meta, Mistral, DeepSeek, xAI
- Reasoning model detection with automatic thinking token overhead (o3, DeepSeek R1, Gemini Deep Think)
- Prompt caching support — 90% energy reduction on cached input tokens
- Cloud provider regions — pass
aws_bedrock,gcp_vertex, orazure_openaifor provider-specific grid intensity - Embodied carbon — includes amortized hardware manufacturing emissions
- Unknown model estimation — unrecognised models are classified by name pattern matching
Endpoints
| Endpoint | Description |
|---|---|
| Calculate Emissions | Estimate CO₂e for LLM inference |
| Supported Models | List all known models with tiers and providers |
Quick Example
curl "https://api.emissions.dev/v1/digital/llm/emissions?\
provider=openai&\
model=gpt-5.2&\
tokens_input=5000&\
tokens_output=2000" \
-H "Authorization: Bearer em_live_xxxx"
{
"data": {
"type": "llm_emission",
"attributes": {
"emissions": {
"co2e": 1.1961,
"co2e_unit": "g",
"breakdown": {
"operational_co2e": 0.8961,
"embodied_co2e": 0.3,
"unit": "g"
}
},
"inference": {
"provider": "openai",
"model": "gpt-5.2",
"tier": "frontier",
"is_known_model": true
}
}
}
}
How It Works
The calculation has three components:
| Component | Description | GHG Protocol Scope |
|---|---|---|
| Token energy | Energy consumed processing input and output tokens | Scope 2 |
| Base energy | Fixed overhead per request (model loading, routing) | Scope 2 |
| Embodied carbon | Amortized GPU hardware manufacturing | Scope 3 Category 1 |
Formula:
Total CO₂e = (Base Energy + Token Energy) × PUE × Grid Intensity + Embodied Carbon
Model Tiers
Models are classified into energy tiers based on architecture size and computational requirements:
| Tier | Energy Profile | Example Models |
|---|---|---|
| Frontier | Highest — massive parameter count, multi-GPU | GPT-5.2, Claude Opus 4.5, Gemini 3 Pro, o3, Llama 4 Behemoth |
| Large | High — standard large models | GPT-4.1, Claude Sonnet 4, Gemini 2.5 Flash, Llama 3.3 70B |
| Medium | Moderate — optimised for efficiency | GPT-4.1-mini, Claude Haiku 4.5, Gemini 2.0 Flash, Llama 4 Scout |
| Small | Lowest — lightweight models | GPT-5-nano, Claude 3 Haiku, Gemini Nano, Llama 3.1 8B |
Unknown models are automatically classified by pattern matching on the model name (e.g. names containing "opus" or "405b" → frontier, "mini" or "flash" → medium).
Reasoning Models
Reasoning models generate internal "thinking" tokens that consume energy but aren't visible in the output. The API automatically detects these models and applies a thinking ratio multiplier:
| Model | Default Thinking Ratio | Effect |
|---|---|---|
| o3 | 6× | 1,000 output tokens → 6,000 effective tokens |
| o3-mini | 4× | Lower overhead than full o3 |
| o1-pro | 8× | Highest overhead |
| DeepSeek R1 | 6× | Similar to o3 |
| Gemini Deep Think | 6× | Google's reasoning variant |
Override with the reasoning_effort parameter: none (1×), low (2×), medium (4×), high (6×), xhigh (10×).
Grid Intensity Resolution
| Priority | Condition | Source | Example |
|---|---|---|---|
| 1 | region parameter provided |
Ember / EPA eGRID | region=FR → 42 gCO₂e/kWh |
| 2 | Cloud provider (aws_bedrock, gcp_vertex, azure_openai) |
Provider region average | Average across all regions |
| 3 | Provider default | Country of provider HQ | Mistral → France (42), DeepSeek → China (544) |
Data Sources
| Source | Coverage | Used For |
|---|---|---|
| Epoch AI — How Much Energy Does ChatGPT Use (2025) | Energy per token estimates | Token energy model |
| Google Gemini Environmental Report (2025) | Google model efficiency | Provider efficiency factors |
| SemiAnalysis — AI Datacenter Energy (2024) | GPU power consumption | Base energy and tier factors |
| Cloud Carbon Footprint | Hardware lifecycle | Embodied carbon per request |
| Ember / EPA eGRID / Electricity Maps | Grid carbon intensity | CO₂e conversion |
Limitations
This endpoint provides estimates, not measured values. Key limitations:
- Energy-per-token factors are derived from published research, not direct measurement from provider infrastructure
- Provider efficiency multipliers are approximations — actual PUE and hardware vary by data centre
- Embodied carbon is amortized across estimated total requests per GPU lifecycle
- Reasoning token estimates use a fixed multiplier rather than actual thinking token counts
- Cached token energy reduction (90%) is an approximation of KV-cache efficiency