Three-doc walkthrough — punchlines & investible themes

Token primer = unit-level mechanics. Agentic primer = how those mechanics multiply at cluster level. SemiAnalysis = where value capture is landing. All three corroborate the same direction; they disagree on who wins the margin.
2026-05-19 · walkthrough

The three punchlines

01

Token primerThe unit-level setup

PunchlineVolume × intensity is outrunning per-token deflation. Total token spend grows even as $/MTok falls.
02

Agentic primerWhat it does to the BOM

PunchlineConsensus underprices on two axes: bottom-up 2030 capex pool runs ~$2.8T Base vs Goldman's $1.86T (1.5×; 1.9× Bull). BofA already lifted 2030 to $1.7T, called it "additive."
03

SemiAnalysisValue capture shifted to model labs

Punchline"Real agentic AI has permanently increased the market-clearing price per token, and there's no going back." Inverts The Information's deteriorating-margin narrative.

Investible themes — where the dollars land

Infrastructure layer Thesis Names to watch
Networking — east-west fabric Agentic shifts traffic inside the DC. Backend NIC + switch silicon scales with port count, not GPU FLOPS. ANET (Tomahawk/Jericho merchant silicon), AVGO, MRVL (custom-ASIC tailwind)
Networking — "scale-across" (DC-to-DC) Brand-new category NVIDIA just named. Long-haul + metro fiber moves from optional to mandatory once GPU clusters exceed single-site power. CIEN (TD Cowen "No DC Is An Island"), DY (JPM fiber-build read), GLW as fiber substrate
Networking — optical in-rack 400G → 800G → 1.6T transceivers; co-packaged optics is the credible next step when copper runs out of room. COHR, LITE, FN; Celestial AI (private — optical interconnect for new memory tier)
KV-cache offload tier (CMX) Brand-new line item. Bluefield-4 + DRAM + NAND SSDs holding "warm context" between HBM and bulk storage. Didn't exist nine months ago. NVIDIA Bluefield; MU / SK Hynix / Samsung (HBM + GDDR7 + NAND); Solidigm, PSTG for SSD layer
CPU pull-through Per-token economics are GPU-led; per-cluster capex is not. Agents pull CPU for orchestration, RAG, KV-spill management. AMD (server CPU), INTC (Xeon 6+); Arm-custom angle via MRVL (AWS Graviton silicon), ARM licensing
Disaggregated prefill/decode Rubin CPX puts GDDR7 onto AI-server bills alongside HBM4 — first time commodity memory enters the inference TAM. Same memory names — but GDDR7 is a new line, incremental volume for MU / SK Hynix not in prior DC mix
Voice/agent application stack Voice agents reroute call audio to 3 new endpoints per turn (STT/LLM/TTS) on sub-200 ms latency budget. ElevenLabs, Deepgram (private); TWLO ConversationRelay as orchestration layer
Vector DB + indexing RAG and codebase indexing are persistent new infrastructure layers — Cursor indexes every customer codebase. Pinecone, Weaviate, Qdrant (all private); MDB Atlas Vector as public adjacency
Agent security OWASP LLM06 ("Excessive Agency") is a real category; agent-driven egress + supply-chain risk through MCP servers. PANW, ZS, CRWD; Protect AI, Lasso private-side
Model labs themselves If SemiAnalysis is right, the labs are the highest-margin link — not the wrappers (Cursor/Devin run negative GM today). Anthropic, OpenAI secondaries; wrapper exposure is the wrong end of the bar-bell

The asymmetry to press on next call

The two bull cases stack rather than cancel:

If bottom-up is right (agentic primer §6)

Capex pool is 1.5–1.9× consensus → networking, fiber, CPU, KV-tier memory all under-counted in published TAMs.

If SemiAnalysis is right

Model-lab margins keep widening → value capture concentrates with Anthropic / OpenAI; wrapper/app layer is a trap.

The least-priced corner is networking — specifically the "scale-across" DC-to-DC fiber + optics layer, because (a) it didn't exist as a named category nine months ago, (b) it shows up in every agentic capex shock scenario, and (c) the public-comp set (CIEN, DY, COHR, LITE) trades at infrastructure multiples, not AI multiples.