Audience: Finance professional, no transformer background assumed; technical terms defined first use. Length: ~1,300 words. Purpose: Engage the most prominent paywalled counter-thesis to the consensus that frontier-AI inference margins are deteriorating. SemiAnalysis's May 2026 post argues margins have already inflected sharply up and that agentic AI has permanently raised the market-clearing token price. The BOM token-pricing validation brief §4.7 should engage that disagreement rather than cite SemiAnalysis as if it corroborates the bearish view.
Sister docs: Token primer (2026-05-15-token-primer.md), agentic primer (2026-05-15-agentic-ai-primer.md), BOM memo (2026-05-13-dc-archetype-bom-memo.md), validation brief (2026-05-18-token-pricing-validation-brief.html).
SemiAnalysis's May 1, 2026 post "AI Value Capture - The Shift To Model Labs" argues that Anthropic's inference-infrastructure gross margins moved from 38% to over 70% in roughly twelve months as agentic AI workloads — multi-turn, high-cache-hit, input-heavy — permanently re-priced the market-clearing per-token rate. This is directly opposed to The Information's mid-2025 reporting that the BOM validation brief §4.7 currently leans on (OpenAI 33% GM vs 46% target; Anthropic 40% GM vs 50% target; Anthropic -94% in 2024). Both can be partially true if they measure different things at different snapshots — but at minimum, the consensus "deteriorating inference economics" framing is now contested by the highest-quality industry-research voice on the subject.
The post (Daniel Nishball, Dylan Patel, et al., 2026-05-01; paid tier; local snapshot at research/2026-05-18-token-pricing-validation/07-methodology-pressure-test/2026-05-01-semianalysis-ai-value-capture-shift-to-model-labs.html) makes five load-bearing claims.
(1) The margin inflection is recent and large.
"This year Anthropic's ARR has exploded from $9B to over $44B today, their gross margins on their inference infrastructure have increased from 38% to over 70% over the same period."
(2) Agentic AI is the mechanism. Two technical terms first: cache hit rate is the share of input tokens served from a previously-cached prompt rather than re-processed fresh — Anthropic prices cached input at 10% of standard rate, so high cache-hit workloads pay a fraction of sticker. Input-to-output ratio is the ratio of prompt tokens to generated tokens — multi-turn agentic loops re-feed every prior step's tool calls and observations back into the next prompt, producing input-heavy traces.
"We estimate that the true blended price per million tokens for running Opus 4.7 on agentic tasks at $0.99 despite the sticker price being $5/$25 per MTok. Agentic workloads have extremely high input-to-output ratios (our Claude Code usage has a ratio of about 300:1) and high cache hit rates (90%+). Because cached input tokens only cost $0.50/MTok, most of the tokens end up in the cheapest tier."
The arithmetic: at 300:1 input-to-output and 90% cache hits, a workload nominally priced at $5 input / $25 output blends down to roughly $0.99/MTok. The realized rate sits about 5× below sticker, but it is the realized rate that drives revenue and gross margin in production.
(3) Cost-per-token has fallen even faster than realized price. New silicon — Blackwell (NVIDIA's current-gen GPU), and ASIC alternatives like Google TPUv7 and AWS Trainium 3 — generates 30× more tokens per second per chip than the prior generation at frontier workloads. Software optimization layered on top (NVIDIA's wide-EP, disaggregated-prefill, multi-token-prediction) yields up to 14× higher throughput on identical hardware. The margin gap widens.
(4) The "market-clearing token price" has permanently moved up. SemiAnalysis's thesis statement:
"The age of low gross margins for frontier model providers is over. Real agentic AI has permanently increased the market-clearing price per token, and there's no going back."
The argument: the productivity gain per token from agentic workloads is large enough — SemiAnalysis cites its own internal usage of "$10.95 million dollar annual spend rate" on Anthropic tokens, with productivity gains that "allows us to outcompete all our competitors" — that buyers will pay materially higher prices for frontier-quality tokens. Anthropic has already tested this with Opus Fast (6× regular Opus pricing) and Mythos ($25/$125, 5× regular Opus). Both are higher-margin SKUs and "the most AI-pilled businesses are still more than happy to pay."
(5) Competition will not compete margins away. Two reasons: (i) open-source models — Kimi K2.6 at $0.95/$4 — are still measurably worse on real knowledge work and "exert very little downward pressure on Opus pricing"; (ii) compute supply remains structurally constrained, so no single frontier lab can serve the entire market — "any lab capable of providing true frontier quality will be able to charge based on the economic value delivered by the token rather than competing away each other's margins."
The BOM validation brief §4.7 cites "OpenAI 2025 gross margin 33% vs 46% target; Anthropic gross margin 40% vs 50% target (was −94% in 2024)" to support the claim that published prices are running below cost. The audit capture (docs/audits/2026-05-18-row-72-semianalysis-margin-capture.md) confirms those figures do not appear in the SemiAnalysis post and most likely originate from The Information's reporting on OpenAI's mid-2025 financial disclosures and Anthropic's 2024–2025 trajectory.
The two sources can both be partially right, in three ways:
Where they directly conflict: the directional claim about whether inference economics are deteriorating or improving. Both cannot simultaneously be the right frame for forecasting the next 24–36 months. SemiAnalysis is forecasting up. The Information's reporting reads as a snapshot consistent with under pressure.
Three concrete implications for the BOM token model:
Frontier deflation rate. The model holds frontier $/MTok flat through 2030 (0%/yr) under base case; §4.7 of the validation brief recommends dropping that to 10%/yr to capture workload re-tiering. SemiAnalysis's view, taken at face value, says the opposite — frontier realized rates may rise as buyers swap toward premium SKUs (Opus Fast, Mythos) where productivity gains justify higher prices. The model's frontier-deflation knob may need a wider range: -10%/yr (SemiAnalysis world, SKU mix-shift up) to +20%/yr (validation brief Bear world, competition compresses headline rates).
Frontier token share trajectory. The model assumes frontier tokens move 30% → 40% → 50% of total volume across 2026/28/30. SemiAnalysis's logic supports the higher end: if frontier-quality maintains pricing power and agentic workloads dominate the new-demand layer, the share should rise faster. If the model is right on the trajectory, SemiAnalysis's pricing view amplifies the revenue impact.
Agentic uplift in the sensitivity model. The BOM bottom-up token model (models/bom-token-model-2026-05-14.xlsx, Sensitivity_3Knob sheet) currently runs agentic uplift with uplift_scaling = 0.05. SemiAnalysis's framing — "real agentic AI has permanently increased the market-clearing price per token" — argues that this is an order of magnitude too low. Worth sensitizing against uplift_scaling = 0.20 or higher to bound the SemiAnalysis case.
Validation-brief §4.7 verdict. The current verdict — "the published deflation curve is running against deteriorating unit economics" — assumes the bearish camp is correct. The honest reframing is: two credible sources disagree. The BOM treats the deflation rate at the lower end of the published spread (60–67%/yr from MIT + Epoch); if SemiAnalysis is right that realized agentic rates have moved up and cost-per-token has fallen further, even that lower-end rate compresses.
Is the 38%→70% Anthropic inference-margin progression replicable at OpenAI? SemiAnalysis says OpenAI's revenue mix is more consumer-heavy (less API), so the per-token economics may not have moved as far. The brief should not generalize Anthropic-specific margin moves to the frontier-lab category.
Does the agentic premium-SKU dynamic survive a single major frontier-quality OSS release? SemiAnalysis dismisses Kimi K2.6 today; a meaningfully better open-source release (Llama 5 at frontier? a Chinese lab clearing GPQA-Diamond at $1/$4?) would test the pricing-power argument. [VERIFY] timing.
What share of Anthropic's $44B ARR is durable enterprise contract vs surge consumption? SemiAnalysis's pricing-power thesis is strongest if the buy-side is locked into multi-year contracts at premium rates. If most of the $44B is variable-consumption Claude Code usage, the realized rate is more exposed to cost-of-deflection (cheaper-quality models doing the same job).
https://newsletter.semianalysis.com/p/ai-value-capture-the-shift-to-model. Local snapshot: vault/raw/research-2026-05-18-token-pricing-validation/07-methodology-pressure-test/2026-05-01-semianalysis-ai-value-capture-shift-to-model-labs.html.https://newsletter.semianalysis.com/p/the-coding-assistant-breakdown-more. Local snapshot: vault/raw/research-2026-05-18-token-pricing-validation/07-methodology-pressure-test/2026-04-24-semianalysis-coding-assistant-breakdown.html. Useful for the wrapper-economics framing — coding wrappers (Cursor, Cognition, Windsurf, Replit, Vercel V0, Lovable) running negative gross margins while the model labs widen theirs.docs/audits/2026-05-18-row-72-semianalysis-margin-capture.md — quote-level verification that the 33%/46%/40%/-94% figures are not in the SemiAnalysis post and that SemiAnalysis is on the opposite directional side of the brief's framing.Cross-references
docs/briefs/2026-05-13-dc-archetype-bom-memo.mddocs/briefs/2026-05-15-token-primer.mddocs/briefs/2026-05-15-agentic-ai-primer.mddocs/briefs/2026-05-18-token-pricing-validation-brief.html