BOM Token Model — Pricing Assumptions Validation

Stress-test of the four load-bearing pricing inputs in bom-token-model-2026-05-18.xlsx

Date: 2026-05-18 · Status: Validation findings; pre–model-rebuild

Source synthesis: research/2026-05-18-token-pricing-validation/00-synthesis.md (full per-thread findings, citations, and 40+ saved PDFs/HTMLs)

Style: Every technical term defined the first time it appears.

Contents

The question, and why it matters
Terms (define-first-use)
Audit of current model assumptions
Empirical findings — seven research threads
Recommended 3-tier restructure
Impact on the 2030 forecast
Sources to add to Z_Sources
Sources cited

1. The question, and why it matters

The BOM token model rolls up to a 2030 token-revenue forecast of $5,675B. That figure is the product of two things: an effective-token volume model (knobs for agentic intensity, agentic share, adoption depth, induced demand) and a pricing model. The pricing side has four load-bearing inputs:

Frontier-tier price — $9.00/MTok blended, held flat through 2030 (0%/yr deflation)
Commodity-tier price — $0.40/MTok blended, deflating at 80%/yr at iso-capability
Frontier share of tokens — rising 30% → 40% → 50% across 2026/28/30
Current frontier share baseline — 30% in 2026

Inputs 1 and 2 had thin source attribution in Z_Sources. Inputs 3 and 4 had no source attribution at all. Before the model can be defended in front of an LP or used as an anchor for downstream deliverables, each assumption needs an empirical check. This brief reports the findings of seven research threads run 2026-05-18 and proposes a 3-tier restructure that the empirical evidence supports.

2. Terms (define-first-use)

MTok — million tokens. Standard LLM billing unit.
Token — sub-word fragment of text; the actual unit LLMs read and write. ~0.75 words per token in English.
Blended 80/20 — weighted-average price assuming 80% input + 20% output tokens, the standard convention for forecasting workloads.
Frontier-tier — top-of-stack flagship: Claude Opus 4.x, OpenAI GPT-5.5 flagship, OpenAI o3-Pro reasoning premium, Google Gemini 3.x Pro >200K context. Sticker $5–30/MTok input, $25–180/MTok output.
Mid-tier — workhorse premium: Claude Sonnet 4.x, OpenAI GPT-5.4, Google Gemini 3.1 Pro. Sticker $2–3 input / $12–15 output. Analogy: a ChatGPT Plus ($20/mo) user lives here; a ChatGPT Pro ($200/mo) user lives in Frontier.
Commodity-tier — high-volume cheap: Claude Haiku, Gemini Flash, GPT-Nano, Meta Llama 3/4, DeepSeek, Qwen, Kimi, MiniMax. Generally <$1/MTok.
Iso-capability deflation — the price decline of the cheapest model meeting a fixed capability benchmark over time. Not the same model getting cheaper; a moving frontier of cheaper models hitting the same bar.
MMLU / GPQA Diamond / SWE-bench — standard LLM benchmarks: general knowledge, PhD-level science, real-world software-engineering tasks.
Reasoning model — LLMs that allocate test-time compute (OpenAI o1/o3, DeepSeek R1, Alibaba QwQ, Gemini Thinking). Token consumption per answer is 10–150× higher than non-reasoning models at the same final capability.
Token share vs revenue share — token share = % of total tokens served; revenue share = % of dollars paid. Frontier prices are 10–30× commodity, so revenue share is much higher than token share at the top.
OpenRouter — third-party aggregator routing API calls across 300+ models from 60+ providers. The only public dataset with cross-vendor token-volume visibility.

3. Audit of current model assumptions

Assumption	Cell	Current value	Cited source in `Z_Sources`	What the cite actually supports
Frontier $/MTok blended	`Inputs!C9`	$9.00	S001 (Anthropic pricing)	Opus 4.7 sticker today; says nothing about 2030
Frontier deflation	`Inputs!C12`	0%/yr	S001 ("Anthropic Opus invariant")	14 months of flat Opus pricing 2024–2025; not a 5-year forecast
Commodity $/MTok blended	`Inputs!C10`	$0.40	S003 (Gemini Flash)	Anchored to Gemini Flash floor; defensible
Commodity iso-capability deflation	`Inputs!C11`	80%/yr	S007, S008 (DeepLearning.AI, Demirer)	Two-year-old blog; Demirer et al. paper supports ~85% compounded, not specifically 80%
Frontier share 2026/28/30	`Inputs!C25–C27`	30 / 40 / 50%	None	Uncited assumption

The structural finding from this audit: two of the four assumptions had no source attribution, and a third (frontier deflation 0%/yr) extrapolates from a 14-month observation to a 5-year forecast. These were the threads to pull.

4. Empirical findings — seven research threads

4.1 — Frontier price flatness ($9 flat through 2030)

Verdict: SOFT. The $9 blended price is correct today (matches Claude Opus 4.5 sticker $5/$25 → $9 blended exactly). But 0%/yr deflation through 2030 is a workload-mix bet, not a pricing bet. Honest read: 10–20%/yr blended frontier deflation from workload re-tiering as buyers swap to cheaper SKUs.

The "frontier price stays flat" claim has two parts that need separate testing:

(a) Top-SKU launch price is sticky generation-to-generation. PARTIALLY TRUE. Anthropic held Claude Opus at $15 input / $75 output from Claude 3 Opus (March 2024) through Claude Opus 4 (May 2025) — 14 months at identical pricing (Anthropic news, March 2024; the Opus 4.5 launch page cited in §8 corroborates the prior pricing via the explicit switch from $15/$75 to $5/$25). a16z's Guido Appenzeller observed in November 2024: "OpenAI's leading model today, o1, has the same cost per output token as GPT-3 had at launch ($60 per million)" (a16z, "Welcome to LLMflation").

(b) Within a model's own life, price stays flat. FALSE. Documented intra-life cuts include GPT-4o $5/$15 → $2.50/$10 in ~3 months (the-decoder, Aug 7 2024); Gemini 1.5 Pro 64% input / 52% output cut in October 2024 — "64% price reduction on input tokens, a 52% price reduction on output tokens... for our strongest 1.5 series model" (Google Developers Blog); and most importantly, Anthropic broke its own Opus pattern in November 2025 — Opus 4.5 launched at $5/$25, two-thirds cheaper than Opus 4, explicitly to "making Opus-level capabilities accessible to even more users, teams, and enterprises" (Anthropic news, Nov 24 2025).

Critically, independent measurement of frontier-equivalent capability shows price decline, not flatness. Epoch AI's frontier-tier measurement (Epoch AI, March 2025) found: "the price to achieve GPT-4's performance on a set of PhD-level science questions fell by 40x per year" — that is ~97.5%/yr at the frontier-equivalent capability bar. MIT FutureTech / Gundlach et al. (March 2026) measured the broader Pareto frontier of price-vs-capability: "the price for a given level of benchmark performance has decreased remarkably fast, around 5× to 10× per year" (arxiv 2511.23455, "The Price of Progress").

4.2 — Commodity deflation 80%/yr

Verdict: CONSERVATIVE TAIL OF THE RANGE — but methodology pressure-test (§4.7 below) drops the defensible rate further. The 80% figure sits at the bottom of the published empirical range (74–99%/yr). After accounting for strategic subsidies and reasoning-model exclusion, the structural defensible rate drops to 60–67%/yr.

Empirical range across credible sources, converted to comparable %/yr units:

Source	Methodology	%/yr	Window
Stanford AI Index 2025, Ch.1	$/MTok at MMLU 64.8%	~98%	Nov 2022 – Oct 2024 (18mo)
Epoch AI median	$/MTok at fixed benchmarks	~98% (50× median)	2022–2025
a16z LLMflation	$/MTok at MMLU 42 & 83	90% (10×/yr)	2021–2024
Altman, "Three Observations"	Public claim — "cost to use a given level of AI"	90% (10×/yr)	n/a (claim)
Demirer et al., NBER w34608	"2023 SOTA models" price decline	~85% (compounded)	2023 – late 2025
MIT FutureTech / Gundlach et al.	Pareto-frontier benchmark-anchored	80–90% (5–10×/yr)	Multi-year
SemiAnalysis ("DeepSeek Debates")	Algorithmic progress (compute-per-capability) — 4×/yr per SemiAnalysis	~75% (cmpd, 1−1/4)	Jan 2025 estimate

Five of seven credible sources cluster at 85% or higher. Only SemiAnalysis's algorithmic-progress estimate (~75%, from "4× less compute per year for the same capability") lands below the model's 80%. The Demirer et al. paper's anchor quote: "Models that were state-of-the-art in 2023 have experienced a price decline of approximately 1000 times, with similarly pronounced deflationary trends at other intelligence levels" (NBER w34608, p.1).

Epoch AI explicitly flags that "the fastest trends (e.g. 900× per year) start after January 2024" — i.e., 2024–2025 was a deflation peak driven by competitive pressure (DeepSeek, Gemini Flash). Both MIT and a16z warn the rate may slow. But §4.7 below is the more substantive critique: the headline rates may be overstated even within their measurement window.

4.3 — Frontier token-share trajectory (30% → 40% → 50%)

Verdict: DIRECTIONALLY WRONG on a token-volume basis. Frontier token share is falling toward commodity, not rising. The 30/40/50% glide path was uncited and not defensible. The basket-definition issue (next section) explains how the model can preserve the spirit of the assumption with a 3-tier restructure.

The OpenRouter "State of AI 2025" study, a joint a16z–OpenRouter project covering 100 trillion tokens of usage Nov 2024–Nov 2025 (OpenRouter / a16z, State of AI 2025, 100T Token Study (local PDF, 104 pp); live landing: openrouter.ai/state-of-ai), is the largest cross-provider usage dataset in existence. Headline findings:

Open-source models rose from <2% to ~one-third of OpenRouter traffic by late 2025, with Chinese open-source (DeepSeek, Qwen, MiniMax, GLM) reaching ~13% average weekly token volume, with peaks reaching nearly 30% in some weeks (State of AI 2025).
The platform's segmentation puts true frontier ("Premium specialists" — OpenAI GPT-5 Pro, Anthropic Opus) at ~3,500× lower volume than mid-tier Sonnet-class (State of AI 2025).

April 2026 OpenRouter rankings confirm the trajectory (DigitalApplied, April 2026):

Pure-flagship models (Opus 4.6 + GPT-5.4 + Gemini 3.1 Pro) account for ~13–18% of top-10 weekly tokens
Anthropic provider share fell from prior 30%+ to 15.4%
Xiaomi MiMo + MiniMax + DeepSeek + Qwen + Moonshot ≈ ~51% Chinese commodity

The counter-evidence: Anthropic's own Economic Index, March 2026, reports "51% of overall usage is Opus" on paid Claude.ai accounts (Anthropic Economic Index). But this measures Anthropic's product surface only — paid users who self-selected into the frontier vendor's premium product. It is not generalizable to the broader market.

4.4 — Current frontier share baseline (30%)

Verdict: BASKET-DEFINITION DEPENDENT. 30% is defensible if Sonnet sits inside the frontier basket; falls to 10–15% if "frontier" means Opus-only. The basket ambiguity is the real diagnosis — fixed by the 3-tier restructure below.

Empirical token-share by basket definition (Q1–Q2 2026 OpenRouter rankings; see DigitalApplied April 2026 summary and OpenRouter State of AI 2025 PDF):

Wide basket (Opus + Sonnet + GPT-5 flagship + GPT-5.4 + Gemini Pro): 25–35% token share. 30% lands in-band.
Narrow basket (Opus + GPT-5.5 flagship + Gemini Pro >200K only — excluding Sonnet and mid-tier): 10–15% token share.

The model's Token_Pricing_Matrix tab classifies Sonnet as "Mid" tier ($5.40/MTok blended), but the frontier_share_* inputs may have been intended to capture the entire premium-priced segment. The ambiguity is real and warrants a structural fix.

4.5 — Mid-tier pricing and share (the new tier)

Verdict: MID IS A STRUCTURALLY DISTINCT STICKY LAYER. Not a moving boundary between Frontier and Commodity. ~$5.00/MTok blended, 22–28% token share, low price elasticity, gap to commodity widening. Justifies a 3-tier model.

Volume-weighted average across the three mid-tier flagships, using April 2026 OpenRouter weekly volumes as weights:

Model	Sticker (input / output)	Blended 80/20	Weekly tokens (T)
Claude Sonnet 4.6	$3 / $15	$5.40	2.18
OpenAI GPT-5.4	$2.50 / $15	$5.00	0.98
Google Gemini 3.1 Pro	$2 / $12	$4.00	0.87
Weighted average	—	$5.00	4.03

Mid-tier behavior the data revealed (OpenRouter / a16z State of AI 2025 (local PDF); DigitalApplied April 2026):

Price is sticky. Sonnet held flat across versions 4 → 4.5 → 4.6. Gemini 3.1 Pro actually raised price vs Gemini 2.5 Pro.
Price elasticity is very low. OpenRouter measured (§8 of the State of AI 2025 PDF): a 10% price cut produces only 0.5–0.7% incremental usage.
Gap to commodity is widening, not converging. Mid-tier sits at a 6–15× multiple over commodity rates and the spread is increasing.

4.6 — Agentic revenue thesis

Verdict: PARTIALLY SUPPORTED, STRUCTURALLY FRAGILE. Today's revenue mix is Frontier+Mid heavy (~70%). But the trajectory through 2030 is diffusion, not concentration. Load-bearing question: how fast does routing infrastructure mature.

The hypothesis was: even if frontier token share falls, the price premium means frontier revenue share rises with agentic adoption. Two pieces of evidence supporting the static snapshot:

Anthropic took 66.3% of OpenRouter top-app revenue ($50.4M of $76.0M) in April 2026 despite running roughly half the token volume of commodity rivals (CodeSOTA, OpenRouter Models tracker). The price premium is real.
Qwen 3.6 Plus did 3.25T tokens for $2.5M vs Anthropic Sonnet's 3.09T for $19.7M on the same platform — same approximate workload, 8× less revenue at commodity tier.

The diffusion forces:

DeepSeek V3.2 + R1.1 reach 8.7% of AI Router tokens, already ahead of Claude Opus 4.7's 7.9% on the same platform.
CodeRouter and intelligent routing infrastructure ship a "93% cost reduction vs Opus-for-everything" message (coderouter.io, Apr 2026) — every major coding agent (Cursor, GitHub Copilot, Devin, Cognition) now ships some form of model routing.
Anthropic itself cut Opus from $15/$75 to $5/$25 in Nov 2025, collapsing the Opus-to-Sonnet revenue ratio from 5× to 1.67× and removing most of the structural Frontier revenue premium.

Cross-tier unit economics on agentic tasks (saved locally, multiple sources):

Workflow	Tier	$/agentic task	Source
Devin (autonomous coding agent)	Frontier-heavy	$9.80 raw / $47.60 all-in	AgentMarketCap, Apr 2026
Cline (open coding agent)	Frontier-heavy	$34.20 / bug fix	[VERIFY — source search pending]
Cursor (mid via Sonnet)	Mid	$0.10–0.15	iamraghuveer.com, Apr 2026
CodeRouter (intelligent routing)	Mixed	~$2.30 blended (vs $33 Opus-only)	coderouter.io, Apr 2026
DeepSeek-R1 self-hosted	Commodity reasoning	$7/MTok (9× cheaper than o1)	Together AI, Feb 2025

The frontier-revenue premium survives where (a) frontier is strictly necessary AND (b) buyers don't route. Both conditions are eroding as routing matures.

4.7 — Methodology pressure-test (the most consequential thread)

Verdict: THE 80–99%/YR DEFLATION HEADLINE IS OVERSTATED, BUT THE INFERENCE-MARGIN PICTURE IS CONTESTED. Structural algorithmic-efficiency rate is 3×/yr ≈ 67%/yr — the one number MIT and Epoch both agree on; drop the model's commodity deflation from 80%/yr to 60–67%/yr base case. Two credible sources disagree on the inference-margin trajectory: The Information's mid-2025 reporting shows margins under pressure (33% / 40% actuals vs 46% / 50% targets); SemiAnalysis's May 2026 post shows Anthropic inference-infra margin moving from 38% to 70%+ in twelve months. The BOM model treats the deflation rate at the lower end of the spread; if SemiAnalysis is right, that rate compresses further.

Four substantive critiques of the iso-capability methodology that survive scrutiny:

Token-level pricing systematically overstates iso-capability deflation when reasoning models exist. Epoch's own notebook excludes reasoning models, admitting "the price per token is not a good proxy for the cost to achieve a benchmark score" (epoch-research/llm-benchmark-efficiency, llm_price_trends.ipynb). The headline 280× deflation is real for $/MTok at fixed scores, but irrelevant if achieving the score now takes 100× more tokens per query.
Frontier $/answer is actually RISING 3–18×/yr at fixed real-world capability. MIT's Figure 9 (arxiv 2511.23455) measures $/benchmark-run, not $/MTok: GPQA-Diamond frontier cost +17.9×/yr, SWE-V +7.7×/yr, AIME +3.0×/yr. Reasoning models burn 20–150× more tokens per answer than non-reasoning at similar final score; the "deflation" disappears when measured per useful output.
Two credible sources disagree on whether published prices are running below cost.
- The pressure view (The Information). OpenAI 2025 gross margin 33% vs 46% target; Anthropic gross margin 40% vs 50% target (was −94% in 2024) — figures sourced via secondary citation in the audit corpus, originally The Information's reporting on OpenAI's and Anthropic's mid-2025 financial disclosures [VERIFY — The Information paywall capture pending; figures not yet anchored to the primary article]. SemiAnalysis on the commodity end: DeepSeek is "providing inference at cost to gain market share, and not actually making any money" (SemiAnalysis, "DeepSeek Debates," Jan 31 2025; local snapshot: research/2026-05-18-token-pricing-validation/02-commodity-deflation-rate/2026-05-18-semianalysis-deepseek-debates.html). On this view, the published deflation curve is running against deteriorating unit economics — consistent with strategic loss-leader pricing that has a structural floor.
- The counter-thesis (SemiAnalysis, May 2026). SemiAnalysis's "AI Value Capture — The Shift To Model Labs" (Daniel Nishball et al., 2026-05-01; URL; local snapshot: research/2026-05-18-token-pricing-validation/07-methodology-pressure-test/2026-05-01-semianalysis-ai-value-capture-shift-to-model-labs.html) argues the opposite — that Anthropic's inference-infrastructure gross margin moved from 38% to over 70% in roughly twelve months as agentic workloads (300:1 input:output, 90%+ cache hit rates) blended Opus 4.7's realized rate down to $0.99/MTok against $5/$25 sticker, and Anthropic introduced premium SKUs (Opus Fast at 6×, Mythos at 5×) absorbing willing demand. SemiAnalysis's thesis: "The age of low gross margins for frontier model providers is over. Real agentic AI has permanently increased the market-clearing price per token, and there's no going back." Full engagement and reconciliation in the counter-thesis writeup.
- Reconciliation. The two can be partially compatible if they measure different snapshots (Information mid-2025 vs SemiAnalysis YTD-2026; inflection happened between them), different scopes (inference-infra margin vs all-in company margin), or different SKU mixes (agentic Opus 4.7 vs full Anthropic API book). They directly conflict on direction: pressure-down vs structural-up. The BOM treats the deflation rate at the lower end of the published spread (60–67%/yr per (1) and (2) above); SemiAnalysis's counter-thesis, if right, would compress that rate further on the frontier side via SKU mix-shift and price-up, while leaving the commodity-side iso-capability rate roughly intact.
Benchmark contamination inflates the iso-capability bar itself. MMLU-CF (contamination-free reformulation) shows GPT-4o is 14.6 points lower on the cleaned benchmark (MMLU-CF paper, Dec 2024). SWE-bench Verified has a documented 10.6% leak rate; OpenAI stopped reporting on it in late 2025 (tianpan.co, Apr 2026). If the bar is contaminated, "iso-capability" measurements drift over time.

The structural-efficiency-only rate. MIT and Epoch both agree the algorithmic-only component of deflation runs at 3×/yr ≈ 67%/yr, separating it from strategic pricing and reasoning-token effects. This is the number defensible in front of an LP without flinching — and remains defensible even under the SemiAnalysis margin-inflection view, because algorithmic efficiency and realized-margin trajectory are independent properties.

5. Recommended 3-tier restructure

5.1 Tier definitions (confirmed)

Tier	Members	Blended $/MTok today
Frontier	Claude Opus 4.x, OpenAI GPT-5.5 flagship + reasoning Pro, Google Gemini 3.x Pro >200K context	$9.00
Mid	Claude Sonnet 4.x, OpenAI GPT-5.4, Google Gemini 3.1 Pro	$5.00
Commodity	Claude Haiku, Gemini Flash, GPT-Nano, Meta Llama 3/4, DeepSeek, Qwen, Kimi, MiniMax	$0.40

5.2 Empirical token-share baseline (2026)

Tier	Token share (mid)	Range	Anchor
Frontier	7.5%	5–10%	OpenRouter Apr 2026, pure-flagship subset
Mid	25%	22–28%	OpenRouter, weighted Sonnet+GPT-5.4+Gemini Pro
Commodity	67.5%	62–75%	OpenRouter OSS + cheap-proprietary residual

5.3 Recommended trajectory through 2030 (base case)

This is the model's token-share input. The model multiplies these against per-tier $/MTok prices to compute revenue. Revenue share is a derived output — not an input — and is reported below for cross-check against empirical revenue data.

Year	Frontier $/MTok	Mid $/MTok	Commodity $/MTok	F share	M share	C share	Blended
2026	$9.00	$5.00	$0.400	7.5%	25%	67.5%	$2.20
2028	$7.29	$5.00	$0.044	6%	25%	69%	$1.72
2030	$5.90	$5.00	$0.0047	5%	25%	70%	$1.55

Reasoning per tier:

Frontier deflation 10%/yr — captures workload re-tiering (§4.1). Top SKU may stay nominally flat, but the blended frontier price drops as buyers swap toward cheaper-but-still-frontier SKUs.
Mid sticky (0%/yr) — per §4.5, Sonnet held flat across three versions; Gemini Pro raised price; OpenRouter measured 10% price cut → 0.5–0.7% usage gain. Low elasticity.
Commodity deflation 67%/yr — MIT/Epoch structural-efficiency floor (§4.7). Drops from current model's 80%.
Token shares — Frontier falls modestly (routing pressure); Mid sticky (low elasticity); Commodity gains the residual.

5.4 Bull and bear cases

Bull case — frontier capability gap persists, agentic adoption outruns deflation (Huang's "million-x" world):

Frontier deflation 0%/yr, Mid 0%/yr, Commodity 50%/yr
Token shares 7.5 → 10 → 12% Frontier; 25 → 28 → 30% Mid; 67.5 → 62 → 58% Commodity

Bear case — routing matures fast, commodity reasoning models (DeepSeek-R1, Llama 4 Reasoning) absorb agentic work:

Frontier deflation 20%/yr, Mid 5%/yr, Commodity 80%/yr
Token shares 7.5 → 5 → 3% Frontier; 25 → 22 → 20% Mid; 67.5 → 73 → 77% Commodity

6. Impact on the 2030 forecast

The effective-token volume model (chat + agentic with intensity and induced-demand multipliers) is unchanged at 1,260,972T effective tokens in 2030. Only the pricing model changes.

Scenario	2030 blended $/MTok	2030 token revenue	vs current ($5,675B)
Current model	$4.50	$5,675B	baseline
Restructured base	$1.49	$1,874B	−67% (3.0× lower)
Restructured bull	$2.50–3.00	$3,150–3,780B	−33 to −45%
Restructured bear	$0.80–1.00	$1,000–1,260B	−78 to −82%

The single largest delta vs the current model is the removal of the 30 → 40 → 50% Frontier-share rise. That assumption was doing most of the compounding work in the original $5,675B number. Removing it (or flipping it to flat/declining, as the empirical evidence supports) compresses 2030 revenue by 2–3× before changing any other input.

Derived 2026 revenue-share split (cross-check against §4.6 empirical):

Tier	Token share	$/MTok	Revenue share (derived)	Empirical (§4.6)
Frontier	7.5%	$9.00	30.7%	~30%
Mid	25%	$5.00	56.9%	~40%
Commodity	67.5%	$0.40	12.3%	~30%

The model-derived 2026 revenue split lands close to the empirical for Frontier (30.7% vs ~30%), high for Mid (56.9% vs ~40%), and low for Commodity (12.3% vs ~30%). Two interpretations: either (a) the mid-tier blended price is overstated (Sonnet over-weights the weighted average), or (b) the commodity blended price is understated (Bedrock-served Llama, DeepSeek, and Together-served Llama are charged above the $0.40 Gemini-Flash floor). The reconciliation between derived and empirical revenue share is worth a follow-up tab in the rebuild.

7. Sources to add to `Z_Sources`

New ID	Type	Publisher	Title	Date
S039	Pricing	Anthropic	Claude Opus 4.5 launch ($5/$25)	2025-11-24
S040	Analyst	a16z	Welcome to LLMflation	2024-11-12
S041	Research	Epoch AI	LLM inference price trends	2025-03-12
S042	Research	Stanford HAI	AI Index 2025 Ch.1	2025-04
S043	Academic	NBER (Demirer et al.)	Emerging Market for Intelligence (w34608)	2025-12
S044	Academic	MIT FutureTech (Gundlach et al.)	Price of Progress (arxiv 2511.23455)	2026-03
S045	Industry Data	OpenRouter / a16z	State of AI 2025 — 100T-token study	2025-12-04
S046	Industry Data	DigitalApplied	OpenRouter Rankings April 2026	2026-04
S047	Industry	OpenAI / Altman	Three Observations	2025-02
S048	Industry Data	Anthropic	Economic Index March 2026	2026-03-24
S049	Pricing	OpenAI	GPT-5.4 pricing [VERIFY — live URL returns 403 to curl; dated snapshot pending via Playwright/manual]	2026-Q1
S050	Pricing	Google	Gemini 3.1 Pro pricing [VERIFY — live URL infinite-redirects on curl; dated snapshot pending via Playwright/manual]	2026-Q1
S051	Margin Analysis	SemiAnalysis	OpenAI/Anthropic GM analysis (Substack, paywalled)	2026
S052	Pricing Page	the-decoder	GPT-4o price cut history	2024-08-07
S053	Pricing Page	Google Dev Blog	Gemini 1.5 Pro 64%/52% cut	2024-09-24

8. Sources cited

Anthropic, "Claude Opus 4.5 launch" (2025-11-24). https://www.anthropic.com/news/claude-opus-4-5
Anthropic, "Claude 3 family" (2024-03). https://www.anthropic.com/news/claude-3-family
Anthropic, "Economic Index — March 2026 Report" (2026-03-24). https://www.anthropic.com/research/economic-index-march-2026-report
a16z, Guido Appenzeller, "Welcome to LLMflation" (2024-11-12). https://a16z.com/llmflation-llm-inference-cost/
Sam Altman, "Three Observations" (2025-02). https://blog.samaltman.com/three-observations
DigitalApplied, "OpenRouter Rankings April 2026" (2026-04). https://www.digitalapplied.com/blog/openrouter-rankings-april-2026-top-ai-models-data
Epoch AI, "LLM inference price trends" (2025-03-12). https://epoch.ai/data-insights/llm-inference-price-trends
Demirer, Fradkin, Tadelis, Peng, "The Emerging Market for Intelligence" — NBER w34608 (2025-12). https://www.nber.org/papers/w34608
Google Developers Blog, "Updated Gemini Models — Gemini 1.5 Pro price cut" (2024-09-24). https://developers.googleblog.com/en/updated-gemini-models-reduced-15-pro-pricing-increased-rate-limits-and-more/
Gundlach, Lynch, Mertens, Thompson, "The Price of Progress" — arXiv 2511.23455 (2026-03). https://arxiv.org/abs/2511.23455
OpenRouter / a16z, "State of AI 2025 — 100T Token Study" (2025-12-04). https://openrouter.ai/state-of-ai
SemiAnalysis, "DeepSeek Debates: Chinese Leadership On Cost, True Training Cost, Closed Model Margin Impacts" (2025-01-31). https://newsletter.semianalysis.com/p/deepseek-debates
Stanford HAI, "AI Index Report 2025 — Chapter 1" (2025-04). https://hai.stanford.edu/assets/files/hai_ai-index-report-2025_chapter1_final.pdf
the-decoder, "OpenAI cuts GPT-4o prices and quadruples output tokens" (2024-08-07). https://the-decoder.com/openai-cuts-gpt-4o-prices-and-quadruples-output-tokens/
Full synthesis with all 40+ source files: research/2026-05-18-token-pricing-validation/00-synthesis.md

Saved underlying PDFs and HTML snapshots archived locally (subdirectories 01- through 07-). ~40 files, ~50 MB total. Every URL above was visited and content verified during the 2026-05-18 research run.