SemiAnalysis Counter-Thesis — AI Value Capture Shift

TL;DR

SemiAnalysis's May 1, 2026 post "AI Value Capture - The Shift To Model Labs" argues that Anthropic's inference-infrastructure gross margins moved from 38% to over 70% in roughly twelve months as agentic AI workloads — multi-turn, high-cache-hit, input-heavy — permanently re-priced the market-clearing per-token rate. This is directly opposed to The Information's mid-2025 reporting that the bearish pricing case currently leans on (OpenAI 33% GM vs 46% target; Anthropic 40% GM vs 50% target; Anthropic -94% in 2024). Both can be partially true if they measure different things at different snapshots — but at minimum, the consensus "deteriorating inference economics" framing is now contested by the highest-quality industry-research voice on the subject.

What SemiAnalysis says

The post (Daniel Nishball, Dylan Patel, et al., 2026-05-01; paid tier) makes five load-bearing claims.

(1) The margin inflection is recent and large.

"This year Anthropic's ARR has exploded from $9B to over $44B today, their gross margins on their inference infrastructure have increased from 38% to over 70% over the same period."¹

(2) Agentic AI is the mechanism. Two technical terms first: cache hit rate is the share of input tokens served from a previously-cached prompt rather than re-processed fresh — Anthropic prices cached input at 10% of standard rate, so high cache-hit workloads pay a fraction of sticker. Input-to-output ratio is the ratio of prompt tokens to generated tokens — multi-turn agentic loops re-feed every prior step's tool calls and observations back into the next prompt, producing input-heavy traces.

"We estimate that the true blended price per million tokens for running Opus 4.7 on agentic tasks at $0.99 despite the sticker price being $5/$25 per MTok. Agentic workloads have extremely high input-to-output ratios (our Claude Code usage has a ratio of about 300:1) and high cache hit rates (90%+). Because cached input tokens only cost $0.50/MTok, most of the tokens end up in the cheapest tier."¹

The arithmetic: at 300:1 input-to-output and 90% cache hits, a workload nominally priced at $5 input / $25 output blends down to roughly $0.99/MTok. The realized rate sits about 5× below sticker, but it is the realized rate that drives revenue and gross margin in production.

(3) Cost-per-token has fallen even faster than realized price. New silicon — Blackwell (NVIDIA's current-gen GPU), and ASIC alternatives like Google TPUv7 and AWS Trainium 3 — generates 30× more tokens per second per chip than the prior generation at frontier workloads. Software optimization layered on top (NVIDIA's wide-EP, disaggregated-prefill, multi-token-prediction) yields up to 14× higher throughput on identical hardware. The margin gap widens.

(4) The "market-clearing token price" has permanently moved up. SemiAnalysis's thesis statement:

"The age of low gross margins for frontier model providers is over. Real agentic AI has permanently increased the market-clearing price per token, and there's no going back."¹

The argument: the productivity gain per token from agentic workloads is large enough — SemiAnalysis cites its own internal usage of "$10.95 million dollar annual spend rate" on Anthropic tokens, with productivity gains that "allows us to outcompete all our competitors" — that buyers will pay materially higher prices for frontier-quality tokens. Anthropic has already tested this with Opus Fast (6× regular Opus pricing) and Mythos ($25/$125, 5× regular Opus). Both are higher-margin SKUs and "the most AI-pilled businesses are still more than happy to pay."¹

(5) Competition will not compete margins away. Two reasons: (i) open-source models — Kimi K2.6 at $0.95/$4 — are still measurably worse on real knowledge work and "exert very little downward pressure on Opus pricing"¹; (ii) compute supply remains structurally constrained, so no single frontier lab can serve the entire market — "any lab capable of providing true frontier quality will be able to charge based on the economic value delivered by the token rather than competing away each other's margins."

How this disagrees with The Information

The bearish pricing case cites "OpenAI 2025 gross margin 33% vs 46% target; Anthropic gross margin 40% vs 50% target (was −94% in 2024)" to support the claim that published prices are running below cost. A quote-level audit confirms those figures do not appear in the SemiAnalysis post and most likely originate from The Information's reporting on OpenAI's mid-2025 financial disclosures and Anthropic's 2024–2025 trajectory.

The two sources can both be partially right, in three ways:

Different snapshot. The Information's 33%/40% figures are calendar-2025; SemiAnalysis's 38%→70% trajectory ends "today" (May 2026). The disagreement may be Q1 2026 vs YTD 2026 — i.e., the inflection happened between the two measurements.
Different definition. SemiAnalysis's 70%+ figure is "inference infrastructure" gross margin (per the post's own phrasing). The Information's reported margins are all-in company-level. Inference-infra margin can be 70%+ while all-in is lower if training spend, headcount, and R&D dominate the cost base.
Different SKU mix. SemiAnalysis's $0.99 blended is agentic-workload Opus 4.7, not the average API call. Across the full Anthropic book — chat, lower-tier SKUs, retail consumer products — the realized rate is closer to sticker and the margin is closer to The Information's figure.

Where they directly conflict: the directional claim about whether inference economics are deteriorating or improving. Both cannot simultaneously be the right frame for forecasting the next 24–36 months. SemiAnalysis is forecasting up. The Information's reporting reads as a snapshot consistent with under pressure.

What it means for the BOM thesis

Three concrete implications for the BOM token model:

Frontier deflation rate. An earlier version of the model held frontier $/MTok flat through 2030 (0%/yr), and the pricing analysis recommended dropping that to 10–20%/yr to capture workload re-tiering (the standalone frontier-deflation knob no longer exists; the canonical model expresses tier dynamics through chain mix-shift rather than a single deflation rate). SemiAnalysis's view, taken at face value, says the opposite — frontier realized rates may rise as buyers swap toward premium SKUs (Opus Fast, Mythos) where productivity gains justify higher prices. The honest range to pressure-test spans -10%/yr (SemiAnalysis world, SKU mix-shift up) to +20%/yr (Bear world, competition compresses headline rates).

High-value token share trajectory. The earlier model carried an uncited "frontier tokens 30% → 40% → 50% of volume across 2026/28/30" glide path; the pricing analysis already adjudicated that as directionally wrong on a token-volume basis ("frontier token share is falling toward commodity, not rising") and recommended preserving the spirit via a chain restructure. The canonical model executed exactly that restructure: frontier-chat share declines (≈ 15.6% → 5.7% → 1.3% across 2026/28/30) while agentic share rises sharply (≈ 24% → 64% → 91%). SemiAnalysis's economic logic — that high-value, frontier-quality, agentic workloads dominate the new-demand layer and keep pricing power — maps onto the rising agentic share, not onto frontier-chat. So the direction the counter-thesis needs (high-value tokens growing as a share of the workload) is supported by the current model; what changes is the label (agentic, not frontier-chat) and the realization that bulk frontier chat is itself commoditizing. SemiAnalysis's pricing view amplifies the revenue impact of that rising agentic layer.

Agentic uplift in the model. An earlier bottom-up version expressed agentic uplift through a single uplift-scaling knob on one sensitivity sheet — neither survives the restructure. The canonical model instead carries a dedicated agentic chain driven by an agentic-mix share series (rising to ≈91% by 2030) and a set of agentic-intensity knobs (conservative / base / mid / high / active). SemiAnalysis's framing — "real agentic AI has permanently increased the market-clearing price per token" — argues for sensitizing toward the high/active end of the agentic-intensity range (and the upper agentic-mix path) to bound the SemiAnalysis case, rather than the conservative default.

The deflation-curve verdict. The current verdict — "the published deflation curve is running against deteriorating unit economics" — assumes the bearish camp is correct. The honest reframing is: two credible sources disagree. The pricing analysis places the defensible commodity deflation rate at the lower end of the published spread (60–67%/yr, derived from Stanford / Epoch / MIT FutureTech, after stripping strategic subsidies and reasoning-model effects — an analytical recommendation, not a live model cell); if SemiAnalysis is right that realized agentic rates have moved up and cost-per-token has fallen further, even that lower-end rate compresses.

Open questions

Is the 38%→70% Anthropic inference-margin progression replicable at OpenAI? SemiAnalysis says OpenAI's revenue mix is more consumer-heavy (less API), so the per-token economics may not have moved as far. The brief should not generalize Anthropic-specific margin moves to the frontier-lab category.
Does the agentic premium-SKU dynamic survive a single major frontier-quality OSS release? SemiAnalysis dismisses Kimi K2.6 today; a meaningfully better open-source release (Llama 5 at frontier? a Chinese lab clearing GPQA-Diamond at $1/$4?) would test the pricing-power argument.
What share of Anthropic's $44B ARR is durable enterprise contract vs surge consumption? SemiAnalysis's pricing-power thesis is strongest if the buy-side is locked into multi-year contracts at premium rates. If most of the $44B is variable-consumption Claude Code usage, the realized rate is more exposed to cost-of-deflection (cheaper-quality models doing the same job).

Sources

Primary: SemiAnalysis, "AI Value Capture - The Shift To Model Labs" (Daniel Nishball et al., 2026-05-01; paid tier). newsletter.semianalysis.com/p/ai-value-capture-the-shift-to-model.
Related: SemiAnalysis, "The Coding Assistant Breakdown: More Tokens Please" (Max Kan, 2026-04-24; paid tier). newsletter.semianalysis.com/p/the-coding-assistant-breakdown-more. Useful for the wrapper-economics framing — coding wrappers (Cursor, Cognition, Windsurf, Replit, Vercel V0, Lovable) running negative gross margins while the model labs widen theirs.²
Counter-source: The Information, OpenAI/Anthropic 2025 margin reporting (33%/46%, 40%/50%, −94%/2024) — figures via secondary citation; primary article capture pending, so the specific margin figures should be treated as indicative rather than confirmed.
Note: A quote-level check confirms the 33%/46%/40%/−94% figures are not in the SemiAnalysis post, and that SemiAnalysis sits on the opposite directional side of the bearish framing.

Cross-references

BOM memo §5 — the agentic-shift scenarios: dc-archetype-bom memo
Token primer §7 — the deflation curve, frontier-flat vs commodity-collapse: token primer
Agentic primer §6 — thesis tie-back, token-volume drivers: agentic-AI primer

Sources

SemiAnalysis "AI Value Capture - The Shift To Model Labs", 2026-05-01 · https://newsletter.semianalysis.com/p/ai-value-capture-the-shift-to-model Newsauto-blessed — "te that the true blended price per million tokens for running Opus 4.7 on agentic tasks at $0.99 despite the sticker price being $5/$25 per MTok. Agentic workloads have extremely high input-to-output ratios (our Claude Code usage has a ratio of about 300:1) and high cache hit rates (90%+). Because cached input tokens only cost $0.50/MTok, most of the tokens end up in the cheapest tier." ↩
SemiAnalysis "The Coding Assistant Breakdown: More Tokens Please", 2026-04-24 · https://newsletter.semianalysis.com/p/the-coding-assistant-breakdown-more Newsauto-blessed — "These companies often have negative gross margins, and their status as wrappers seems clearer than ever after the rise of Claude Code and Codex." ↩