Stress-test of the four load-bearing pricing inputs in bom-token-model-2026-05-18.xlsx
Z_SourcesThe BOM token model rolls up to a 2030 token-revenue forecast of $5,675B. That figure is the product of two things: an effective-token volume model (knobs for agentic intensity, agentic share, adoption depth, induced demand) and a pricing model. The pricing side has four load-bearing inputs:
Inputs 1 and 2 had thin source attribution in Z_Sources. Inputs 3 and 4 had no source attribution at all. Before the model can be defended in front of an LP or used as an anchor for downstream deliverables, each assumption needs an empirical check. This brief reports the findings of seven research threads run 2026-05-18 and proposes a 3-tier restructure that the empirical evidence supports.
| Assumption | Cell | Current value | Cited source in Z_Sources |
What the cite actually supports |
|---|---|---|---|---|
| Frontier $/MTok blended | Inputs!C9 |
$9.00 | S001 (Anthropic pricing) | Opus 4.7 sticker today; says nothing about 2030 |
| Frontier deflation | Inputs!C12 |
0%/yr | S001 ("Anthropic Opus invariant") | 14 months of flat Opus pricing 2024–2025; not a 5-year forecast |
| Commodity $/MTok blended | Inputs!C10 |
$0.40 | S003 (Gemini Flash) | Anchored to Gemini Flash floor; defensible |
| Commodity iso-capability deflation | Inputs!C11 |
80%/yr | S007, S008 (DeepLearning.AI, Demirer) | Two-year-old blog; Demirer et al. paper supports ~85% compounded, not specifically 80% |
| Frontier share 2026/28/30 | Inputs!C25–C27 |
30 / 40 / 50% | None | Uncited assumption |
The structural finding from this audit: two of the four assumptions had no source attribution, and a third (frontier deflation 0%/yr) extrapolates from a 14-month observation to a 5-year forecast. These were the threads to pull.
The "frontier price stays flat" claim has two parts that need separate testing:
(a) Top-SKU launch price is sticky generation-to-generation. PARTIALLY TRUE. Anthropic held Claude Opus at $15 input / $75 output from Claude 3 Opus (March 2024) through Claude Opus 4 (May 2025) — 14 months at identical pricing (Anthropic news, March 2024; the Opus 4.5 launch page cited in §8 corroborates the prior pricing via the explicit switch from $15/$75 to $5/$25). a16z's Guido Appenzeller observed in November 2024: "OpenAI's leading model today, o1, has the same cost per output token as GPT-3 had at launch ($60 per million)" (a16z, "Welcome to LLMflation").
(b) Within a model's own life, price stays flat. FALSE. Documented intra-life cuts include GPT-4o $5/$15 → $2.50/$10 in ~3 months (the-decoder, Aug 7 2024); Gemini 1.5 Pro 64% input / 52% output cut in October 2024 — "64% price reduction on input tokens, a 52% price reduction on output tokens... for our strongest 1.5 series model" (Google Developers Blog); and most importantly, Anthropic broke its own Opus pattern in November 2025 — Opus 4.5 launched at $5/$25, two-thirds cheaper than Opus 4, explicitly to "making Opus-level capabilities accessible to even more users, teams, and enterprises" (Anthropic news, Nov 24 2025).
Critically, independent measurement of frontier-equivalent capability shows price decline, not flatness. Epoch AI's frontier-tier measurement (Epoch AI, March 2025) found: "the price to achieve GPT-4's performance on a set of PhD-level science questions fell by 40x per year" — that is ~97.5%/yr at the frontier-equivalent capability bar. MIT FutureTech / Gundlach et al. (March 2026) measured the broader Pareto frontier of price-vs-capability: "the price for a given level of benchmark performance has decreased remarkably fast, around 5× to 10× per year" (arxiv 2511.23455, "The Price of Progress").
Empirical range across credible sources, converted to comparable %/yr units:
| Source | Methodology | %/yr | Window |
|---|---|---|---|
| Stanford AI Index 2025, Ch.1 | $/MTok at MMLU 64.8% | ~98% | Nov 2022 – Oct 2024 (18mo) |
| Epoch AI median | $/MTok at fixed benchmarks | ~98% (50× median) | 2022–2025 |
| a16z LLMflation | $/MTok at MMLU 42 & 83 | 90% (10×/yr) | 2021–2024 |
| Altman, "Three Observations" | Public claim — "cost to use a given level of AI" | 90% (10×/yr) | n/a (claim) |
| Demirer et al., NBER w34608 | "2023 SOTA models" price decline | ~85% (compounded) | 2023 – late 2025 |
| MIT FutureTech / Gundlach et al. | Pareto-frontier benchmark-anchored | 80–90% (5–10×/yr) | Multi-year |
| SemiAnalysis ("DeepSeek Debates") | Algorithmic progress (compute-per-capability) — 4×/yr per SemiAnalysis | ~75% (cmpd, 1−1/4) | Jan 2025 estimate |
Five of seven credible sources cluster at 85% or higher. Only SemiAnalysis's algorithmic-progress estimate (~75%, from "4× less compute per year for the same capability") lands below the model's 80%. The Demirer et al. paper's anchor quote: "Models that were state-of-the-art in 2023 have experienced a price decline of approximately 1000 times, with similarly pronounced deflationary trends at other intelligence levels" (NBER w34608, p.1).
Epoch AI explicitly flags that "the fastest trends (e.g. 900× per year) start after January 2024" — i.e., 2024–2025 was a deflation peak driven by competitive pressure (DeepSeek, Gemini Flash). Both MIT and a16z warn the rate may slow. But §4.7 below is the more substantive critique: the headline rates may be overstated even within their measurement window.
The OpenRouter "State of AI 2025" study, a joint a16z–OpenRouter project covering 100 trillion tokens of usage Nov 2024–Nov 2025 (OpenRouter / a16z, State of AI 2025, 100T Token Study (local PDF, 104 pp); live landing: openrouter.ai/state-of-ai), is the largest cross-provider usage dataset in existence. Headline findings:
April 2026 OpenRouter rankings confirm the trajectory (DigitalApplied, April 2026):
The counter-evidence: Anthropic's own Economic Index, March 2026, reports "51% of overall usage is Opus" on paid Claude.ai accounts (Anthropic Economic Index). But this measures Anthropic's product surface only — paid users who self-selected into the frontier vendor's premium product. It is not generalizable to the broader market.
Empirical token-share by basket definition (Q1–Q2 2026 OpenRouter rankings; see DigitalApplied April 2026 summary and OpenRouter State of AI 2025 PDF):
The model's Token_Pricing_Matrix tab classifies Sonnet as "Mid" tier ($5.40/MTok blended), but the frontier_share_* inputs may have been intended to capture the entire premium-priced segment. The ambiguity is real and warrants a structural fix.
Volume-weighted average across the three mid-tier flagships, using April 2026 OpenRouter weekly volumes as weights:
| Model | Sticker (input / output) | Blended 80/20 | Weekly tokens (T) |
|---|---|---|---|
| Claude Sonnet 4.6 | $3 / $15 | $5.40 | 2.18 |
| OpenAI GPT-5.4 | $2.50 / $15 | $5.00 | 0.98 |
| Google Gemini 3.1 Pro | $2 / $12 | $4.00 | 0.87 |
| Weighted average | — | $5.00 | 4.03 |
Mid-tier behavior the data revealed (OpenRouter / a16z State of AI 2025 (local PDF); DigitalApplied April 2026):
The hypothesis was: even if frontier token share falls, the price premium means frontier revenue share rises with agentic adoption. Two pieces of evidence supporting the static snapshot:
The diffusion forces:
Cross-tier unit economics on agentic tasks (saved locally, multiple sources):
| Workflow | Tier | $/agentic task | Source |
|---|---|---|---|
| Devin (autonomous coding agent) | Frontier-heavy | $9.80 raw / $47.60 all-in | AgentMarketCap, Apr 2026 |
| Cline (open coding agent) | Frontier-heavy | $34.20 / bug fix | [VERIFY — source search pending] |
| Cursor (mid via Sonnet) | Mid | $0.10–0.15 | iamraghuveer.com, Apr 2026 |
| CodeRouter (intelligent routing) | Mixed | ~$2.30 blended (vs $33 Opus-only) | coderouter.io, Apr 2026 |
| DeepSeek-R1 self-hosted | Commodity reasoning | $7/MTok (9× cheaper than o1) | Together AI, Feb 2025 |
The frontier-revenue premium survives where (a) frontier is strictly necessary AND (b) buyers don't route. Both conditions are eroding as routing matures.
Four substantive critiques of the iso-capability methodology that survive scrutiny:
research/2026-05-18-token-pricing-validation/02-commodity-deflation-rate/2026-05-18-semianalysis-deepseek-debates.html). On this view, the published deflation curve is running against deteriorating unit economics — consistent with strategic loss-leader pricing that has a structural floor.research/2026-05-18-token-pricing-validation/07-methodology-pressure-test/2026-05-01-semianalysis-ai-value-capture-shift-to-model-labs.html) argues the opposite — that Anthropic's inference-infrastructure gross margin moved from 38% to over 70% in roughly twelve months as agentic workloads (300:1 input:output, 90%+ cache hit rates) blended Opus 4.7's realized rate down to $0.99/MTok against $5/$25 sticker, and Anthropic introduced premium SKUs (Opus Fast at 6×, Mythos at 5×) absorbing willing demand. SemiAnalysis's thesis: "The age of low gross margins for frontier model providers is over. Real agentic AI has permanently increased the market-clearing price per token, and there's no going back." Full engagement and reconciliation in the counter-thesis writeup.The structural-efficiency-only rate. MIT and Epoch both agree the algorithmic-only component of deflation runs at 3×/yr ≈ 67%/yr, separating it from strategic pricing and reasoning-token effects. This is the number defensible in front of an LP without flinching — and remains defensible even under the SemiAnalysis margin-inflection view, because algorithmic efficiency and realized-margin trajectory are independent properties.
| Tier | Members | Blended $/MTok today |
|---|---|---|
| Frontier | Claude Opus 4.x, OpenAI GPT-5.5 flagship + reasoning Pro, Google Gemini 3.x Pro >200K context | $9.00 |
| Mid | Claude Sonnet 4.x, OpenAI GPT-5.4, Google Gemini 3.1 Pro | $5.00 |
| Commodity | Claude Haiku, Gemini Flash, GPT-Nano, Meta Llama 3/4, DeepSeek, Qwen, Kimi, MiniMax | $0.40 |
| Tier | Token share (mid) | Range | Anchor |
|---|---|---|---|
| Frontier | 7.5% | 5–10% | OpenRouter Apr 2026, pure-flagship subset |
| Mid | 25% | 22–28% | OpenRouter, weighted Sonnet+GPT-5.4+Gemini Pro |
| Commodity | 67.5% | 62–75% | OpenRouter OSS + cheap-proprietary residual |
This is the model's token-share input. The model multiplies these against per-tier $/MTok prices to compute revenue. Revenue share is a derived output — not an input — and is reported below for cross-check against empirical revenue data.
| Year | Frontier $/MTok | Mid $/MTok | Commodity $/MTok | F share | M share | C share | Blended |
|---|---|---|---|---|---|---|---|
| 2026 | $9.00 | $5.00 | $0.400 | 7.5% | 25% | 67.5% | $2.20 |
| 2028 | $7.29 | $5.00 | $0.044 | 6% | 25% | 69% | $1.72 |
| 2030 | $5.90 | $5.00 | $0.0047 | 5% | 25% | 70% | $1.55 |
Reasoning per tier:
Bull case — frontier capability gap persists, agentic adoption outruns deflation (Huang's "million-x" world):
Bear case — routing matures fast, commodity reasoning models (DeepSeek-R1, Llama 4 Reasoning) absorb agentic work:
The effective-token volume model (chat + agentic with intensity and induced-demand multipliers) is unchanged at 1,260,972T effective tokens in 2030. Only the pricing model changes.
| Scenario | 2030 blended $/MTok | 2030 token revenue | vs current ($5,675B) |
|---|---|---|---|
| Current model | $4.50 | $5,675B | baseline |
| Restructured base | $1.49 | $1,874B | −67% (3.0× lower) |
| Restructured bull | $2.50–3.00 | $3,150–3,780B | −33 to −45% |
| Restructured bear | $0.80–1.00 | $1,000–1,260B | −78 to −82% |
The single largest delta vs the current model is the removal of the 30 → 40 → 50% Frontier-share rise. That assumption was doing most of the compounding work in the original $5,675B number. Removing it (or flipping it to flat/declining, as the empirical evidence supports) compresses 2030 revenue by 2–3× before changing any other input.
Derived 2026 revenue-share split (cross-check against §4.6 empirical):
| Tier | Token share | $/MTok | Revenue share (derived) | Empirical (§4.6) |
|---|---|---|---|---|
| Frontier | 7.5% | $9.00 | 30.7% | ~30% |
| Mid | 25% | $5.00 | 56.9% | ~40% |
| Commodity | 67.5% | $0.40 | 12.3% | ~30% |
The model-derived 2026 revenue split lands close to the empirical for Frontier (30.7% vs ~30%), high for Mid (56.9% vs ~40%), and low for Commodity (12.3% vs ~30%). Two interpretations: either (a) the mid-tier blended price is overstated (Sonnet over-weights the weighted average), or (b) the commodity blended price is understated (Bedrock-served Llama, DeepSeek, and Together-served Llama are charged above the $0.40 Gemini-Flash floor). The reconciliation between derived and empirical revenue share is worth a follow-up tab in the rebuild.
Z_Sources| New ID | Type | Publisher | Title | Date |
|---|---|---|---|---|
| S039 | Pricing | Anthropic | Claude Opus 4.5 launch ($5/$25) | 2025-11-24 |
| S040 | Analyst | a16z | Welcome to LLMflation | 2024-11-12 |
| S041 | Research | Epoch AI | LLM inference price trends | 2025-03-12 |
| S042 | Research | Stanford HAI | AI Index 2025 Ch.1 | 2025-04 |
| S043 | Academic | NBER (Demirer et al.) | Emerging Market for Intelligence (w34608) | 2025-12 |
| S044 | Academic | MIT FutureTech (Gundlach et al.) | Price of Progress (arxiv 2511.23455) | 2026-03 |
| S045 | Industry Data | OpenRouter / a16z | State of AI 2025 — 100T-token study | 2025-12-04 |
| S046 | Industry Data | DigitalApplied | OpenRouter Rankings April 2026 | 2026-04 |
| S047 | Industry | OpenAI / Altman | Three Observations | 2025-02 |
| S048 | Industry Data | Anthropic | Economic Index March 2026 | 2026-03-24 |
| S049 | Pricing | OpenAI | GPT-5.4 pricing [VERIFY — live URL returns 403 to curl; dated snapshot pending via Playwright/manual] | 2026-Q1 |
| S050 | Pricing | Gemini 3.1 Pro pricing [VERIFY — live URL infinite-redirects on curl; dated snapshot pending via Playwright/manual] | 2026-Q1 | |
| S051 | Margin Analysis | SemiAnalysis | OpenAI/Anthropic GM analysis (Substack, paywalled) | 2026 |
| S052 | Pricing Page | the-decoder | GPT-4o price cut history | 2024-08-07 |
| S053 | Pricing Page | Google Dev Blog | Gemini 1.5 Pro 64%/52% cut | 2024-09-24 |
research/2026-05-18-token-pricing-validation/00-synthesis.md