BOM Token Model — Pricing Assumptions Validation

Stress-test of the four load-bearing pricing inputs in bom-token-model-2026-05-18.xlsx

Date: 2026-05-18  ·  Status: Validation findings; pre–model-rebuild

Source synthesis: research/2026-05-18-token-pricing-validation/00-synthesis.md (full per-thread findings, citations, and 40+ saved PDFs/HTMLs)

Style: Every technical term defined the first time it appears.

Contents
  1. The question, and why it matters
  2. Terms (define-first-use)
  3. Audit of current model assumptions
  4. Empirical findings — seven research threads
  5. Recommended 3-tier restructure
  6. Impact on the 2030 forecast
  7. Sources to add to Z_Sources
  8. Sources cited

1. The question, and why it matters

The BOM token model rolls up to a 2030 token-revenue forecast of $5,675B. That figure is the product of two things: an effective-token volume model (knobs for agentic intensity, agentic share, adoption depth, induced demand) and a pricing model. The pricing side has four load-bearing inputs:

  1. Frontier-tier price — $9.00/MTok blended, held flat through 2030 (0%/yr deflation)
  2. Commodity-tier price — $0.40/MTok blended, deflating at 80%/yr at iso-capability
  3. Frontier share of tokens — rising 30% → 40% → 50% across 2026/28/30
  4. Current frontier share baseline — 30% in 2026

Inputs 1 and 2 had thin source attribution in Z_Sources. Inputs 3 and 4 had no source attribution at all. Before the model can be defended in front of an LP or used as an anchor for downstream deliverables, each assumption needs an empirical check. This brief reports the findings of seven research threads run 2026-05-18 and proposes a 3-tier restructure that the empirical evidence supports.

2. Terms (define-first-use)

3. Audit of current model assumptions

Assumption Cell Current value Cited source in Z_Sources What the cite actually supports
Frontier $/MTok blended Inputs!C9 $9.00 S001 (Anthropic pricing) Opus 4.7 sticker today; says nothing about 2030
Frontier deflation Inputs!C12 0%/yr S001 ("Anthropic Opus invariant") 14 months of flat Opus pricing 2024–2025; not a 5-year forecast
Commodity $/MTok blended Inputs!C10 $0.40 S003 (Gemini Flash) Anchored to Gemini Flash floor; defensible
Commodity iso-capability deflation Inputs!C11 80%/yr S007, S008 (DeepLearning.AI, Demirer) Two-year-old blog; Demirer et al. paper supports ~85% compounded, not specifically 80%
Frontier share 2026/28/30 Inputs!C25–C27 30 / 40 / 50% None Uncited assumption

The structural finding from this audit: two of the four assumptions had no source attribution, and a third (frontier deflation 0%/yr) extrapolates from a 14-month observation to a 5-year forecast. These were the threads to pull.

4. Empirical findings — seven research threads

4.1 — Frontier price flatness ($9 flat through 2030)

Verdict: SOFT. The $9 blended price is correct today (matches Claude Opus 4.5 sticker $5/$25 → $9 blended exactly). But 0%/yr deflation through 2030 is a workload-mix bet, not a pricing bet. Honest read: 10–20%/yr blended frontier deflation from workload re-tiering as buyers swap to cheaper SKUs.

The "frontier price stays flat" claim has two parts that need separate testing:

(a) Top-SKU launch price is sticky generation-to-generation. PARTIALLY TRUE. Anthropic held Claude Opus at $15 input / $75 output from Claude 3 Opus (March 2024) through Claude Opus 4 (May 2025) — 14 months at identical pricing (Anthropic news, March 2024; the Opus 4.5 launch page cited in §8 corroborates the prior pricing via the explicit switch from $15/$75 to $5/$25). a16z's Guido Appenzeller observed in November 2024: "OpenAI's leading model today, o1, has the same cost per output token as GPT-3 had at launch ($60 per million)" (a16z, "Welcome to LLMflation").

(b) Within a model's own life, price stays flat. FALSE. Documented intra-life cuts include GPT-4o $5/$15 → $2.50/$10 in ~3 months (the-decoder, Aug 7 2024); Gemini 1.5 Pro 64% input / 52% output cut in October 2024 — "64% price reduction on input tokens, a 52% price reduction on output tokens... for our strongest 1.5 series model" (Google Developers Blog); and most importantly, Anthropic broke its own Opus pattern in November 2025 — Opus 4.5 launched at $5/$25, two-thirds cheaper than Opus 4, explicitly to "making Opus-level capabilities accessible to even more users, teams, and enterprises" (Anthropic news, Nov 24 2025).

Critically, independent measurement of frontier-equivalent capability shows price decline, not flatness. Epoch AI's frontier-tier measurement (Epoch AI, March 2025) found: "the price to achieve GPT-4's performance on a set of PhD-level science questions fell by 40x per year" — that is ~97.5%/yr at the frontier-equivalent capability bar. MIT FutureTech / Gundlach et al. (March 2026) measured the broader Pareto frontier of price-vs-capability: "the price for a given level of benchmark performance has decreased remarkably fast, around 5× to 10× per year" (arxiv 2511.23455, "The Price of Progress").

4.2 — Commodity deflation 80%/yr

Verdict: CONSERVATIVE TAIL OF THE RANGE — but methodology pressure-test (§4.7 below) drops the defensible rate further. The 80% figure sits at the bottom of the published empirical range (74–99%/yr). After accounting for strategic subsidies and reasoning-model exclusion, the structural defensible rate drops to 60–67%/yr.

Empirical range across credible sources, converted to comparable %/yr units:

SourceMethodology%/yrWindow
Stanford AI Index 2025, Ch.1$/MTok at MMLU 64.8%~98%Nov 2022 – Oct 2024 (18mo)
Epoch AI median$/MTok at fixed benchmarks~98% (50× median)2022–2025
a16z LLMflation$/MTok at MMLU 42 & 8390% (10×/yr)2021–2024
Altman, "Three Observations"Public claim — "cost to use a given level of AI"90% (10×/yr)n/a (claim)
Demirer et al., NBER w34608"2023 SOTA models" price decline~85% (compounded)2023 – late 2025
MIT FutureTech / Gundlach et al.Pareto-frontier benchmark-anchored80–90% (5–10×/yr)Multi-year
SemiAnalysis ("DeepSeek Debates")Algorithmic progress (compute-per-capability) — 4×/yr per SemiAnalysis~75% (cmpd, 1−1/4)Jan 2025 estimate

Five of seven credible sources cluster at 85% or higher. Only SemiAnalysis's algorithmic-progress estimate (~75%, from "4× less compute per year for the same capability") lands below the model's 80%. The Demirer et al. paper's anchor quote: "Models that were state-of-the-art in 2023 have experienced a price decline of approximately 1000 times, with similarly pronounced deflationary trends at other intelligence levels" (NBER w34608, p.1).

Epoch AI explicitly flags that "the fastest trends (e.g. 900× per year) start after January 2024" — i.e., 2024–2025 was a deflation peak driven by competitive pressure (DeepSeek, Gemini Flash). Both MIT and a16z warn the rate may slow. But §4.7 below is the more substantive critique: the headline rates may be overstated even within their measurement window.

4.3 — Frontier token-share trajectory (30% → 40% → 50%)

Verdict: DIRECTIONALLY WRONG on a token-volume basis. Frontier token share is falling toward commodity, not rising. The 30/40/50% glide path was uncited and not defensible. The basket-definition issue (next section) explains how the model can preserve the spirit of the assumption with a 3-tier restructure.

The OpenRouter "State of AI 2025" study, a joint a16z–OpenRouter project covering 100 trillion tokens of usage Nov 2024–Nov 2025 (OpenRouter / a16z, State of AI 2025, 100T Token Study (local PDF, 104 pp); live landing: openrouter.ai/state-of-ai), is the largest cross-provider usage dataset in existence. Headline findings:

April 2026 OpenRouter rankings confirm the trajectory (DigitalApplied, April 2026):

The counter-evidence: Anthropic's own Economic Index, March 2026, reports "51% of overall usage is Opus" on paid Claude.ai accounts (Anthropic Economic Index). But this measures Anthropic's product surface only — paid users who self-selected into the frontier vendor's premium product. It is not generalizable to the broader market.

4.4 — Current frontier share baseline (30%)

Verdict: BASKET-DEFINITION DEPENDENT. 30% is defensible if Sonnet sits inside the frontier basket; falls to 10–15% if "frontier" means Opus-only. The basket ambiguity is the real diagnosis — fixed by the 3-tier restructure below.

Empirical token-share by basket definition (Q1–Q2 2026 OpenRouter rankings; see DigitalApplied April 2026 summary and OpenRouter State of AI 2025 PDF):

The model's Token_Pricing_Matrix tab classifies Sonnet as "Mid" tier ($5.40/MTok blended), but the frontier_share_* inputs may have been intended to capture the entire premium-priced segment. The ambiguity is real and warrants a structural fix.

4.5 — Mid-tier pricing and share (the new tier)

Verdict: MID IS A STRUCTURALLY DISTINCT STICKY LAYER. Not a moving boundary between Frontier and Commodity. ~$5.00/MTok blended, 22–28% token share, low price elasticity, gap to commodity widening. Justifies a 3-tier model.

Volume-weighted average across the three mid-tier flagships, using April 2026 OpenRouter weekly volumes as weights:

ModelSticker (input / output)Blended 80/20Weekly tokens (T)
Claude Sonnet 4.6$3 / $15$5.402.18
OpenAI GPT-5.4$2.50 / $15$5.000.98
Google Gemini 3.1 Pro$2 / $12$4.000.87
Weighted average$5.004.03

Mid-tier behavior the data revealed (OpenRouter / a16z State of AI 2025 (local PDF); DigitalApplied April 2026):

4.6 — Agentic revenue thesis

Verdict: PARTIALLY SUPPORTED, STRUCTURALLY FRAGILE. Today's revenue mix is Frontier+Mid heavy (~70%). But the trajectory through 2030 is diffusion, not concentration. Load-bearing question: how fast does routing infrastructure mature.

The hypothesis was: even if frontier token share falls, the price premium means frontier revenue share rises with agentic adoption. Two pieces of evidence supporting the static snapshot:

The diffusion forces:

Cross-tier unit economics on agentic tasks (saved locally, multiple sources):

WorkflowTier$/agentic taskSource
Devin (autonomous coding agent)Frontier-heavy$9.80 raw / $47.60 all-inAgentMarketCap, Apr 2026
Cline (open coding agent)Frontier-heavy$34.20 / bug fix[VERIFY — source search pending]
Cursor (mid via Sonnet)Mid$0.10–0.15iamraghuveer.com, Apr 2026
CodeRouter (intelligent routing)Mixed~$2.30 blended (vs $33 Opus-only)coderouter.io, Apr 2026
DeepSeek-R1 self-hostedCommodity reasoning$7/MTok (9× cheaper than o1)Together AI, Feb 2025

The frontier-revenue premium survives where (a) frontier is strictly necessary AND (b) buyers don't route. Both conditions are eroding as routing matures.

4.7 — Methodology pressure-test (the most consequential thread)

Verdict: THE 80–99%/YR DEFLATION HEADLINE IS OVERSTATED, BUT THE INFERENCE-MARGIN PICTURE IS CONTESTED. Structural algorithmic-efficiency rate is 3×/yr ≈ 67%/yr — the one number MIT and Epoch both agree on; drop the model's commodity deflation from 80%/yr to 60–67%/yr base case. Two credible sources disagree on the inference-margin trajectory: The Information's mid-2025 reporting shows margins under pressure (33% / 40% actuals vs 46% / 50% targets); SemiAnalysis's May 2026 post shows Anthropic inference-infra margin moving from 38% to 70%+ in twelve months. The BOM model treats the deflation rate at the lower end of the spread; if SemiAnalysis is right, that rate compresses further.

Four substantive critiques of the iso-capability methodology that survive scrutiny:

  1. Token-level pricing systematically overstates iso-capability deflation when reasoning models exist. Epoch's own notebook excludes reasoning models, admitting "the price per token is not a good proxy for the cost to achieve a benchmark score" (epoch-research/llm-benchmark-efficiency, llm_price_trends.ipynb). The headline 280× deflation is real for $/MTok at fixed scores, but irrelevant if achieving the score now takes 100× more tokens per query.
  2. Frontier $/answer is actually RISING 3–18×/yr at fixed real-world capability. MIT's Figure 9 (arxiv 2511.23455) measures $/benchmark-run, not $/MTok: GPQA-Diamond frontier cost +17.9×/yr, SWE-V +7.7×/yr, AIME +3.0×/yr. Reasoning models burn 20–150× more tokens per answer than non-reasoning at similar final score; the "deflation" disappears when measured per useful output.
  3. Two credible sources disagree on whether published prices are running below cost.
  4. Benchmark contamination inflates the iso-capability bar itself. MMLU-CF (contamination-free reformulation) shows GPT-4o is 14.6 points lower on the cleaned benchmark (MMLU-CF paper, Dec 2024). SWE-bench Verified has a documented 10.6% leak rate; OpenAI stopped reporting on it in late 2025 (tianpan.co, Apr 2026). If the bar is contaminated, "iso-capability" measurements drift over time.

The structural-efficiency-only rate. MIT and Epoch both agree the algorithmic-only component of deflation runs at 3×/yr ≈ 67%/yr, separating it from strategic pricing and reasoning-token effects. This is the number defensible in front of an LP without flinching — and remains defensible even under the SemiAnalysis margin-inflection view, because algorithmic efficiency and realized-margin trajectory are independent properties.

5. Recommended 3-tier restructure

5.1 Tier definitions (confirmed)

TierMembersBlended $/MTok today
FrontierClaude Opus 4.x, OpenAI GPT-5.5 flagship + reasoning Pro, Google Gemini 3.x Pro >200K context$9.00
MidClaude Sonnet 4.x, OpenAI GPT-5.4, Google Gemini 3.1 Pro$5.00
CommodityClaude Haiku, Gemini Flash, GPT-Nano, Meta Llama 3/4, DeepSeek, Qwen, Kimi, MiniMax$0.40

5.2 Empirical token-share baseline (2026)

TierToken share (mid)RangeAnchor
Frontier7.5%5–10%OpenRouter Apr 2026, pure-flagship subset
Mid25%22–28%OpenRouter, weighted Sonnet+GPT-5.4+Gemini Pro
Commodity67.5%62–75%OpenRouter OSS + cheap-proprietary residual

5.3 Recommended trajectory through 2030 (base case)

This is the model's token-share input. The model multiplies these against per-tier $/MTok prices to compute revenue. Revenue share is a derived output — not an input — and is reported below for cross-check against empirical revenue data.

Year Frontier $/MTok Mid $/MTok Commodity $/MTok F share M share C share Blended
2026$9.00$5.00$0.4007.5%25%67.5%$2.20
2028$7.29$5.00$0.0446%25%69%$1.72
2030$5.90$5.00$0.00475%25%70%$1.55

Reasoning per tier:

5.4 Bull and bear cases

Bull case — frontier capability gap persists, agentic adoption outruns deflation (Huang's "million-x" world):

Bear case — routing matures fast, commodity reasoning models (DeepSeek-R1, Llama 4 Reasoning) absorb agentic work:

6. Impact on the 2030 forecast

The effective-token volume model (chat + agentic with intensity and induced-demand multipliers) is unchanged at 1,260,972T effective tokens in 2030. Only the pricing model changes.

Scenario2030 blended $/MTok2030 token revenuevs current ($5,675B)
Current model$4.50$5,675Bbaseline
Restructured base$1.49$1,874B−67% (3.0× lower)
Restructured bull$2.50–3.00$3,150–3,780B−33 to −45%
Restructured bear$0.80–1.00$1,000–1,260B−78 to −82%

The single largest delta vs the current model is the removal of the 30 → 40 → 50% Frontier-share rise. That assumption was doing most of the compounding work in the original $5,675B number. Removing it (or flipping it to flat/declining, as the empirical evidence supports) compresses 2030 revenue by 2–3× before changing any other input.

Derived 2026 revenue-share split (cross-check against §4.6 empirical):

TierToken share$/MTokRevenue share (derived)Empirical (§4.6)
Frontier7.5%$9.0030.7%~30%
Mid25%$5.0056.9%~40%
Commodity67.5%$0.4012.3%~30%

The model-derived 2026 revenue split lands close to the empirical for Frontier (30.7% vs ~30%), high for Mid (56.9% vs ~40%), and low for Commodity (12.3% vs ~30%). Two interpretations: either (a) the mid-tier blended price is overstated (Sonnet over-weights the weighted average), or (b) the commodity blended price is understated (Bedrock-served Llama, DeepSeek, and Together-served Llama are charged above the $0.40 Gemini-Flash floor). The reconciliation between derived and empirical revenue share is worth a follow-up tab in the rebuild.

7. Sources to add to Z_Sources

New IDTypePublisherTitleDate
S039PricingAnthropicClaude Opus 4.5 launch ($5/$25)2025-11-24
S040Analysta16zWelcome to LLMflation2024-11-12
S041ResearchEpoch AILLM inference price trends2025-03-12
S042ResearchStanford HAIAI Index 2025 Ch.12025-04
S043AcademicNBER (Demirer et al.)Emerging Market for Intelligence (w34608)2025-12
S044AcademicMIT FutureTech (Gundlach et al.)Price of Progress (arxiv 2511.23455)2026-03
S045Industry DataOpenRouter / a16zState of AI 2025 — 100T-token study2025-12-04
S046Industry DataDigitalAppliedOpenRouter Rankings April 20262026-04
S047IndustryOpenAI / AltmanThree Observations2025-02
S048Industry DataAnthropicEconomic Index March 20262026-03-24
S049PricingOpenAIGPT-5.4 pricing [VERIFY — live URL returns 403 to curl; dated snapshot pending via Playwright/manual]2026-Q1
S050PricingGoogleGemini 3.1 Pro pricing [VERIFY — live URL infinite-redirects on curl; dated snapshot pending via Playwright/manual]2026-Q1
S051Margin AnalysisSemiAnalysisOpenAI/Anthropic GM analysis (Substack, paywalled)2026
S052Pricing Pagethe-decoderGPT-4o price cut history2024-08-07
S053Pricing PageGoogle Dev BlogGemini 1.5 Pro 64%/52% cut2024-09-24

8. Sources cited

  1. Anthropic, "Claude Opus 4.5 launch" (2025-11-24). https://www.anthropic.com/news/claude-opus-4-5
  2. Anthropic, "Claude 3 family" (2024-03). https://www.anthropic.com/news/claude-3-family
  3. Anthropic, "Economic Index — March 2026 Report" (2026-03-24). https://www.anthropic.com/research/economic-index-march-2026-report
  4. a16z, Guido Appenzeller, "Welcome to LLMflation" (2024-11-12). https://a16z.com/llmflation-llm-inference-cost/
  5. Sam Altman, "Three Observations" (2025-02). https://blog.samaltman.com/three-observations
  6. DigitalApplied, "OpenRouter Rankings April 2026" (2026-04). https://www.digitalapplied.com/blog/openrouter-rankings-april-2026-top-ai-models-data
  7. Epoch AI, "LLM inference price trends" (2025-03-12). https://epoch.ai/data-insights/llm-inference-price-trends
  8. Demirer, Fradkin, Tadelis, Peng, "The Emerging Market for Intelligence" — NBER w34608 (2025-12). https://www.nber.org/papers/w34608
  9. Google Developers Blog, "Updated Gemini Models — Gemini 1.5 Pro price cut" (2024-09-24). https://developers.googleblog.com/en/updated-gemini-models-reduced-15-pro-pricing-increased-rate-limits-and-more/
  10. Gundlach, Lynch, Mertens, Thompson, "The Price of Progress" — arXiv 2511.23455 (2026-03). https://arxiv.org/abs/2511.23455
  11. OpenRouter / a16z, "State of AI 2025 — 100T Token Study" (2025-12-04). https://openrouter.ai/state-of-ai
  12. SemiAnalysis, "DeepSeek Debates: Chinese Leadership On Cost, True Training Cost, Closed Model Margin Impacts" (2025-01-31). https://newsletter.semianalysis.com/p/deepseek-debates
  13. Stanford HAI, "AI Index Report 2025 — Chapter 1" (2025-04). https://hai.stanford.edu/assets/files/hai_ai-index-report-2025_chapter1_final.pdf
  14. the-decoder, "OpenAI cuts GPT-4o prices and quadruples output tokens" (2024-08-07). https://the-decoder.com/openai-cuts-gpt-4o-prices-and-quadruples-output-tokens/
  15. Full synthesis with all 40+ source files: research/2026-05-18-token-pricing-validation/00-synthesis.md