An agent is a language model wrapped in a loop. Where a chatbot answers one question and stops, an agent reads a task, decides on an action, observes the result, updates what it knows, and loops — taking dozens or hundreds of steps before producing a single user-visible answer. The model itself does not run code or call services; it emits text. The harness — the program around the model — parses that text, executes the requested action (read a file, query a database, send an email, run a test), and feeds the result back into the next prompt. The loop continues until the agent reports done or the harness stops it.
The formal academic framing is the ReAct paper (Yao et al., October 2022). ReAct stands for "Reasoning and Acting" and its central move is to interleave the two in a single language-model trace rather than treat them as separate problems (Yao et al., 2022, ReAct: Synergizing Reasoning and Acting in Language Models). Every step is one trip through reason → act → observe.
That structural difference vs. a chatbot — looping, calling external services, accumulating state — is what makes the system agentic. It also re-shapes the underlying infrastructure: more tokens consumed per task, different network paths, new memory tiers, a different mix of GPU and CPU silicon at the cluster level. The rest of this primer walks through those shifts.
Three layers stacked on top of each other. The bottom layer is the loop; the middle is how the model emits an action; the top is how the application talks to third-party tools.
The loop (ReAct). The harness sends a prompt; the model writes a short rationale ("I need today's weather for Paris") and emits a structured request to call a tool. The harness runs the tool, appends the result to the conversation, and re-prompts the model. The model is stateless between turns; the harness carries state forward by appending to the conversation each time. This is the underlying reason agentic systems are so token-hungry — every loop iteration re-reads everything that came before it.
Tool-use schemas. Both Anthropic and OpenAI ship a structured protocol for the action step. The developer provides a JSON Schema that declares each tool's name, description, and arguments; the model emits a typed call that the harness can execute programmatically. Anthropic's framing: "Tool use lets Claude call functions you define or that Anthropic provides. Claude decides when to call a tool based on the user's request and the tool's description, then returns a structured call that your application executes (client tools) or that Anthropic executes (server tools)" (Anthropic, "Tool use with Claude"). Claude emits a tool_use block; OpenAI's equivalent is tool_calls in the Responses API (OpenAI, "Function calling"). Both vendors offer a strict: true mode that guarantees the call matches the schema exactly — closing the gap where earlier models would sometimes emit malformed JSON. Structurally, this is remote procedure call (RPC, a standard way for one program to invoke a function in another) embedded inside the conversation.
The Model Context Protocol (MCP). Function calling solved how one model talks to one developer's own tools. MCP solves what happens when every AI application wants to talk to every tool. Anthropic launched MCP on November 25, 2024 with the launch-post framing: "Yet even the most sophisticated models are constrained by their isolation from data—trapped behind information silos and legacy systems. Every new data source requires its own custom implementation" (Anthropic, "Introducing the Model Context Protocol"). The official spec uses a USB-C analogy — "MCP provides a standardized way to connect AI applications to external systems" (modelcontextprotocol.io). Without MCP, connecting M AI applications to N data sources requires M×N bespoke integrations. With MCP, each side speaks the protocol; M+N implementations cover the same ground.
The adoption timeline is the load-bearing signal: launched Nov 25, 2024; OpenAI adopted MCP in March 2025, Google DeepMind in April 2025, Microsoft via Semantic Kernel through 2025; in December 2025 Anthropic donated MCP to the new Agentic AI Foundation under the Linux Foundation, co-founded with Block and OpenAI (Wikipedia consolidating dated record; primary corroboration in each vendor's developer changelog). All three major foundation-model labs converged on a shared three-piece protocol stack — ReAct-style loop, schema-validated tool calls, MCP as the cross-vendor integration layer — within five months.
Anchor products. Every modern agent product runs some version of this stack. Claude Code (Anthropic, launched Feb 24 2025) — "reads your codebase, edits files, runs commands, and integrates with your development tools" (Anthropic Claude Code overview). Cursor — multi-model agent inside a VS Code fork; supports Anthropic, OpenAI, Google, xAI, and its own model Composer 2. Devin (Cognition, March 12 2024) — "a tireless, skilled teammate, equally ready to build alongside you or independently complete tasks" — first to ship the marketing of "autonomous software engineer," scored 13.86% on SWE-bench end-to-end at launch (Cognition); Cognition has not publicly disclosed Devin's internal architecture. Anthropic computer use (Oct 22 2024) — gives the model a screenshot tool and mouse/keyboard actions, scored 14.9% on OSWorld at launch (Anthropic). OpenAI Agents SDK + Responses API (March 11 2025) — OpenAI's consolidated agent stack with built-in web search, file search, and computer-use tools.
Per the token primer §4, every LLM call stresses four GPU resources: compute (FP16/FP8 TFLOPS), HBM bandwidth, HBM capacity, and inter-GPU NVLink. Agent steps stress those resources differently from chatbot calls, in three concrete ways.
Multi-turn KV-cache pressure. Quick refresher on the hardware terms used below — covered in detail in the token primer §4. A modern AI accelerator is a GPU (NVIDIA's H100, H200, or Blackwell B200) carrying two on-package memories: HBM (high-bandwidth memory, the fast scratchpad where the model's weights and active state live during a call) measured in gigabytes of capacity and terabytes-per-second of bandwidth. Inter-GPU links — NVLink — let several GPUs talk to each other when a model is too big for one. Compute throughput is measured in TFLOPS (trillions of floating-point operations per second). The current Hopper / Hopper-200 / Blackwell generations carry 80 / 141 / 192 GB of HBM per GPU respectively (NVIDIA H100, H200, DGX B200). The KV-cache is the model's running record of attention computations — "what I've already looked at in this conversation" — and it lives in HBM during a call. NVIDIA's own inference guide notes the cache grows "linearly with batch size and sequence length" and "can have a large memory footprint" (NVIDIA Developer Blog). NVIDIA's worked example: Llama 2 7B at 16-bit precision, batch size 1, the KV-cache alone is ~2 GB. Agentic workloads compound this on both axes. Sequence length grows per task because every step appends its tool calls, observations, and reasoning to the same context window — Stanford's measurement of agentic coding tasks: "consuming 1000x more tokens than code reasoning and code chat" (Stanford Digital Economy Lab). And as context grows, the GPU holds fewer concurrent conversations in HBM — a direct linear tradeoff. At 4K context, a 7B model fits ~278 users per GPU; at 32K, ~35.
The KV-cache offload tier — NVIDIA CMX. Long-context, multi-turn agents now routinely push the KV-cache past on-package HBM's 80 / 141 / 192 GB ceiling on Hopper / Hopper-200 / Blackwell. NVIDIA's response, announced at GTC 2026, is CMX (the Context Memory Storage Platform): a system that lets the KV-cache spill from HBM to a cheaper, larger tier of conventional DRAM and NAND SSDs, controlled by Bluefield-4 — NVIDIA's data-processing unit (DPU, a network-attached processor that handles I/O and storage). SemiAnalysis's coverage of the GTC 2026 announcement: "CMX addresses a growing bottleneck in modern inference infrastructure: the rapid expansion of KV Cache required to support long-context and agentic workloads. KV cache grows linearly with input sequence length and number of users and is the primary tier of memory expansion that inference must address" (SemiAnalysis, "GTC 2026: The Inference Kingdom Expands," Mar 24 2026; full coverage indexed in Noldor). The architectural meaning is a new memory tier in every agent-serving cluster — a "warm context" layer between HBM and bulk storage that did not exist as a distinct line item before. Vendors building into that tier include NVIDIA (Bluefield-4 + CMX), Celestial AI (optical-interconnected DRAM), and the NAND/SSD vendors (Solidigm, Micron) supplying the underlying storage capacity.
Cluster-level CPU pull-through. Per-token economics are GPU-led — covered in the token primer §4 — but per-cluster capex pulls CPU back in. Each agent step still runs as a normal LLM forward pass on the GPU, but the agent loop's surrounding work — running tool calls, parsing JSON, retrieving from vector stores, managing the KV-cache spill across the cluster — runs on CPU. Microsoft's "Fairwater" AI campus for OpenAI is architected with a separate air-cooled CPU-and-storage building alongside the dense GPU building; Microsoft has publicly stated the network includes "millions of CPU cores for operational compute tasks" (Microsoft Source, Nov 12 2025). SemiAnalysis's satellite-imagery analysis estimates the split at 48 MW CPU vs 295 MW GPU — roughly 1:6 — and projects that ratio rises as GPU performance-per-watt improves faster than CPU performance-per-watt (SemiAnalysis, "CPUs Are Back," Feb 2026). The IR confirmation lives in both major server-CPU vendors' Q4 2025 calls: AMD's Lisa Su flagged "these AI processes or AI agents that are spinning off a lot of work, in an enterprise, they're actually going to a lot of traditional CPU tasks." as a driver of 2026 server-CPU TAM growth (AMD Q4 2025 earnings call, Feb 3 2026); Intel CFO David Zinsner framed the same dynamic as "The world is shifting from human-prompted requests to persistent and recursive commands driven by computer-to-computer interactions." (Intel Q4 2025 earnings call, Jan 22 2026); Meta's April 2026 AWS Graviton partnership commits "tens of millions of Graviton cores" explicitly for "agentic AI — autonomous systems that reason, plan, and execute complex tasks" (Meta, April 2026).
A workflow's "agentic" upgrade is not just about productivity. The data path, the artifacts produced, and the infrastructure footprint all change. Three documented cases.
Old flow. Engineer types code into an IDE. Local files, local compile, local tests. Network traffic from the developer's machine: git pulls/pushes, package-manager fetches, occasional doc lookups. The only AI bytes on the wire are short autocomplete suggestions (original GitHub Copilot: snippets in, single-line completions out).
New flow. Engineer types a natural-language task into Claude Code, Cursor, GitHub Copilot agent mode, or Devin. The agent reads files via a Read tool, grep/glob to locate symbols, edits files via an Edit tool, runs shell commands via a Bash tool — compiling, testing, opening branches, drafting PRs. Per Anthropic, Claude Code can write tests for untested code, fix lint errors across a project, resolve merge conflicts, update dependencies, and write release notes (Anthropic Claude Code overview). GitHub's Copilot coding agent runs in "its own ephemeral development environment, powered by GitHub Actions," meaning execution moves off the developer's laptop into GitHub's hosted infrastructure (GitHub docs).
What changed. Per Bai et al. (Stanford/MIT/Microsoft Research/Anthropic), "agentic tasks are uniquely expensive, consuming 1000x more tokens than code reasoning and code chat"; "runs on the same task can differ by up to 30x in total tokens"; and "Kimi-K2 and Claude-Sonnet-4.5, on average, consume over 1.5 million more tokens than GPT-5" on identical SWE-bench Verified tasks (arXiv 2604.22750, 24 Apr 2026). Input tokens dominate cost — the agent re-reads the full conversation history every turn. Cursor adds operational color: "In a focused sprint earlier this year, we drove all tool calls to at least 2 or often 3 9s of reliability" (Cursor engineering blog); semantic search across an indexed codebase is "one of the biggest drivers of agent performance," with cross-user index reuse made possible because clones within one organization "average 92% similarity" (Cursor, indexing). Codebase indexing is a new infrastructure layer that did not exist in the autocomplete era. Macro footprint: SemiAnalysis estimates "4% of GitHub public commits are being authored by Claude Code right now. At the current trajectory, we believe that Claude Code will be 20%+ of all daily commits by the end of 2026" (SemiAnalysis, "Claude Code is the Inflection Point," Feb 2026; indexed in Noldor).
Old flow. Caller dials a 1-800 number. The call rides telco signalling protocols (SS7 for traditional circuit-switched lines; SIP — Session Initiation Protocol — for modern IP-based ones) into a contact-center-as-a-service (CCaaS) provider's session border controllers, gets queued, and terminates on a human agent's softphone. CCaaS providers — Five9, NICE, Genesys, Twilio Flex — handle queuing, routing, IVR menus (interactive voice response — "press 1 for billing"), CRM integration. Voice bytes themselves ride RTP (real-time transport protocol, the standard for streaming audio) between the telco edge and the CCaaS data center; no LLM is in the loop.
New flow. The call lands at the telco / CCaaS, but now flows over a WebSocket — a long-lived two-way HTTP connection that keeps a streaming session open — to a streaming speech-to-text (STT) endpoint. The transcript is streamed token-by-token into an LLM. The LLM's response tokens are streamed into a text-to-speech (TTS) endpoint. Synthesized audio is streamed back to the caller. The LLM can invoke tool calls mid-conversation — "transfer to human," "look up order," "process refund." Twilio's ConversationRelay documentation describes the architecture: a WebSocket session between the caller's call leg and ConversationRelay handles the speech-text conversions and orchestrates LLM calls in real time (Twilio docs), with documented STT/TTS providers Deepgram, Google, Amazon Polly, and ElevenLabs.
What changed. Voice bytes that used to terminate at a CCaaS data center near a telco peering point now mirror over WebSocket to three new endpoints per turn: a streaming STT provider's cluster (Deepgram, Google), an LLM provider's cluster (Anthropic, OpenAI), and a streaming TTS provider's cluster (ElevenLabs, Polly). Each on different infrastructure. Latency is unforgiving: Deepgram publishes "transcription latency in 300 milliseconds or less for streaming workloads" (Deepgram streaming latency), and its Aura-2 TTS targets sub-200 ms time-to-first-byte on the audio-output side (Deepgram Aura-2). Total wall-clock budget for a human-feeling voice agent is roughly 800 ms end-to-end. Deepgram prices its Voice Agent API (STT + LLM + TTS + orchestration bundled) at "$4.50 per hour" (Deepgram Voice Agent) — a different unit economics from per-license CCaaS subscription pricing. New artifacts: a real-time transcript (text), an LLM trace with reasoning and tool calls (JSON), TTS audio chunks (binary), and end-of-call summaries — where the old flow generated one audio file plus a CRM entry.
Old flow. A ticket arrives via email, chat widget, or web form. Help-desk software (Zendesk, Intercom, Salesforce Service Cloud, ServiceNow) routes it to a queue based on rule-based triggers. A human agent picks it up, looks up the customer, writes a reply. SLAs measured in hours or days. Pre-LLM bots used deterministic decision trees, keyword routing, scripted FAQs.
New flow. An AI agent (Intercom Fin, Zendesk AI, Salesforce Agentforce, Forethought) reads the ticket, retrieves relevant context from the company's knowledge base via RAG (retrieval-augmented generation: searching a vector database to fetch the right context, rather than stuffing everything into the prompt), calls tools to look up order status / customer history / billing, generates a reply, and either sends it or hands off to a human with a draft. Intercom defines the operational metric formally: "Automation rate = Involvement rate × Resolution rate. If Fin is involved in 50% of eligible conversations and resolves 60% of those, your automation rate is 30%" (Intercom).
What changed. Fin.ai publishes "Fin averages a 67% resolution rate across its customer base, with top-performing customers reaching 80-84%" (Fin.ai KPIs framework) at "$0.99 per resolution" — vendor self-disclosure, but the price is observable. Klarna is the most cited deployment, with the most complete public record on both sides of the question. Klarna's Feb 27 2024 press release: the AI assistant "has had 2.3 million conversations, two-thirds of Klarna's customer service chats," doing "the equivalent work of 700 full-time agents," with customers resolving "in less than 2 mins compared to 11 mins previously" and "a 25% drop in repeat inquiries," estimated to drive "$40 million USD in profit improvement" (Klarna press release). Fifteen months later, Klarna was "turning back to people to help with customer service work" (CX Dive, May 2025) — the deflection and unit-economics numbers survive; the framing that AI permanently replaced 700 customer-service agents did not. Where ticket data used to stay inside one help-desk platform, every turn now serializes ticket text + retrieved knowledge-base chunks + customer context into a prompt sent to an LLM endpoint — measurable per-interaction inference cost where before there was only platform-license cost.
Five second-order shifts, sized to what's currently disclosed in primary sources.
Network traffic — east-west dominance and DC-to-DC pull-through. A data-center network has two directions: north-south is traffic between the facility and end users on the internet; east-west is everything that stays inside the facility, between servers. For a chatbot Q&A the work is mostly north-south. For an agent it's mostly east-west: each tool call, vector-store lookup, code execution, and follow-up LLM step terminates on a different server inside the same facility, and the agent loops through many such steps per user-visible answer. Industry coverage puts east-west at 70–90% of total flows inside AI-driven data centers (commentary-grade; the agentic-specific fraction is not yet primary-disclosed). Arista's FY26 10-K: "Modern AI applications need high-bandwidth, lossless, low-latency, scalable, multi-tenant networks that interconnect hundreds or thousands of accelerators at high speed from 100Gbps to 400Gbps, evolving to 800Gbps and beyond" (Arista AI networking). Google's Jupiter fabric supports "more than 6Pb/sec of datacenter bandwidth" — Google's own engineering disclosure notes Jupiter has evolved beyond its early non-blocking-Clos design (a hierarchical multi-stage topology where every server can reach every other server through bounded hops) toward an optical-circuit-switched direct-mesh architecture (Google Cloud blog) — the kind of fabric needed when every agent step generates new east-west flows. And the long-haul story: NVIDIA's Spectrum-XGS Ethernet, launched at Hot Chips Aug 22 2025, extends NVIDIA's Spectrum-X backend networking "to interconnect multiple, distributed data centers to form massive AI super-factories capable of giga-scale intelligence," framed against the fact that "individual data centers are reaching the limits of power and capacity within a single facility" (NVIDIA press release). NVIDIA calls this new fabric category "scale-across," distinct from scale-up (more GPUs in one rack) and scale-out (more racks in one building) — and it drives the metro and long-haul fiber capex thesis carried by TD Cowen on Ciena ("As AI clusters scale, it is no longer sufficient to optimize only the backend fabric inside a single facility") and JPMorgan on Dycom ("A critical component of these infrastructure builds is the metro and long-haul fiber").
Memory hierarchy — vector databases and persistent state. Agents lean on vector databases — Pinecone, Weaviate, Qdrant — to keep their working knowledge bounded. A vector database stores text, images, or other modalities as numerical fingerprints called embeddings (high-dimensional vectors), so a query can retrieve the semantically closest matches rather than exact word matches. That's what makes RAG cheap: rather than stuffing a million tokens of company documents into every prompt, the agent embeds the user's question, queries the vector DB, retrieves the 5–10 most relevant chunks, and includes only those in the LLM call. Most production vector DBs use HNSW (Hierarchical Navigable Small World, a multi-layer graph index with logarithmic lookup time) and/or IVF (Inverted File Index, partitions vectors into clusters and searches the relevant one). Weaviate's own framing: HNSW gives "logarithmic time complexity" vs a flat index's linear cost (Weaviate docs). Persistent memory across sessions is the other new layer. Anthropic shipped Memory as a Claude product feature: launched to Team and Enterprise on Sep 11 2025, expanded to Pro and Max on Oct 23 2025 (Anthropic Memory blog). The architecture: Claude generates a project-scoped text summary that persists across conversations and gets re-injected into the system prompt of new ones; users can view and edit it directly.
Security surface. Agents enlarge the attack surface in three ways. Prompt injection is the foundational risk — per OWASP, "A Prompt Injection Vulnerability occurs when user prompts alter the LLM's behavior or output in unintended ways," with the indirect variant — malicious instructions hidden in third-party content the agent retrieves (a webpage, a PDF, a vector-DB record) — being especially insidious for agents that fetch outside data (OWASP LLM01). Excessive Agency is the agent-specific category in OWASP's 2025 LLM Top 10: "An LLM-based system is often granted a degree of agency by its developer – the ability to call functions or interface with other systems via extensions" — the vulnerability arises from excessive functionality, excessive permissions, or excessive autonomy (OWASP LLM06). Recommended mitigations are essentially principle-of-least-privilege adapted to agents — minimize tools, restrict permissions, require human approval for high-impact actions. Supply-chain risk rides on top of MCP: an agent connecting to a third-party MCP server is trusting that server's code, the data it serves, and any tool calls it offers. Anthropic's own 2025 "agentic misalignment" research stress-tested 16 frontier models and found that "In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors." (Anthropic research) — those behaviors emerged under forced-binary stress tests, not in normal operation, but the finding is that current safety training does not reliably prevent them. New infrastructure spend follows the new attack surface — Zscaler, Palo Alto, and emerging vendors like Lasso Security and Protect AI are scaling into the agent-monitoring category.
Work-product shift. Agents produce more structured outputs (markdown, JSON, code, tool-call payloads) and fewer rendered binaries (PDFs, PowerPoint, images) than human creative work. Every major agent framework — LangGraph, OpenAI Agents SDK, Anthropic tool use, MCP — standardizes on JSON-serialized state. The downstream effects are plausible but unsized: storage type-mix shifts toward text and structured data; rendering overhead falls; artifacts are diff-friendly. This is a hypothesis, not a measured effect — no hyperscaler or analyst report yet quantifies the shift in storage-tier or rendering-load terms. Worth tracking, not yet a sized investment claim.
Two structural reasons agentic AI matters for the AI-infrastructure capex thesis.
Modeling AI-data-center capex on a per-archetype basis — training-core, inference, legacy enterprise, edge, agentic — surfaces that agentic clusters carry the highest network + interconnect share (10.4% of cluster all-in, vs 6.5–7.9% in every other archetype) AND the highest fiber + optics share (3.7% vs 1.3–2.9%) AND the lowest land/shell/EPC share (10.4% vs 15.4–37.2%) at a frontier-tier $34.6M/MW. Three reasons it shows up that way: agent tasks execute across many accelerators with persistent state passed between them, so the NVLink + Spectrum-X scale-up fabric runs larger per dollar of compute; tool-call and DC-to-DC traffic is super-linear and symmetrical — unlike one-way CDN traffic (where content flows out to viewers in a single direction), every agent step round-trips, and a 100 ms delay breaks the reasoning loop, pulling 400G/800G long-haul interconnect from optional to standard; and disaggregated-prefill architectures — splitting the prompt-ingestion (prefill) and answer-generation (decode) steps of every LLM call onto different GPU SKUs — push fast-but-cheap GDDR7 memory (commodity gaming-GPU memory used for prefill in NVIDIA's new Rubin CPX racks) onto server-side bills for the first time alongside HBM4 (the latest premium on-package GPU memory) and the new NVMe-SSD KV-cache tier. The investment-relevant fact: a capex pool sized at the unified-DC level under-prices the network, fiber, and CPU lines that an agentic-heavy build-out actually consumes.
A bottom-up token-economics model — one that sizes the AI-infra capex pool as industry token volume × deflated $/MTok × hyperscaler capex-to-revenue ratio rather than backing into a number from analyst-led top-down forecasts — produces a 2030 capex pool of ~$2,837B under a Base scenario (1,000× agentic-intensity multiplier, 30% agentic share of workload, mainstream adoption), against the corresponding Goldman Sachs top-down anchor of $1,860B (extrapolated from Goldman's published 2029 figure of $1,570B). That is a ratio of 1.53× (Base), with the bottom-up vs top-down overshoot widening to 1.86× under Bull-case knob settings and contracting to 0.93× under Bear — bottom-up never reaches top-down in the Bear case. The crossover happens between 2028 and 2030: bottom-up runs at 8.5% of top-down in 2026, 39.7% in 2028, then leaps above by 2030. Three structural drivers: ~80%/yr commodity $/MTok deflation (token primer §7); ~80%/yr industry token-volume CAGR; and Stanford's 1,000× per-task token-consumption multiplier for agentic vs single-call workloads. Caveats worth flagging: (i) Goldman's 2030 figure is a model-internal extrapolation from the published 2029 anchor — against BofA's stated $2,500B 2030 anchor instead, the Base ratio compresses to ~1.13×; (ii) the 80% token-volume growth rate is the single most load-bearing assumption; (iii) an outer-edge "1M× continuous-agents" scenario produces non-physical numbers and should be read as bound-testing, not forecasting.
External corroboration of the additive view. Bank of America raised its 2030 AI data-center forecast from $1.4T to $1.7T on May 13 2026, explicitly framed as "additive to the overall market" citing "diversification of compute and memory components" — the same CPU-pull-through and SRAM/specialty-memory categories the BOM archetype model carries (BofA, May 13 2026). Microsoft's Satya Nadella, on the Q3 FY2026 earnings call (April 29 2026), described the shift in enterprise software unit economics from a per-user business model to a per-user and usage model in which agents working on behalf of users (or with them) create the value — the enterprise-side statement of the same additive framing (Microsoft Q3 FY26 earnings call transcript).
In one sentence: consensus underprices agentic AI in two ways at once — it under-models the cluster-level bill of materials, and it under-counts the token-volume × intensity multiplier that drives demand for the underlying compute. The bottom-up math compounds into a capex pool that runs 1.5× to 1.9× the published 2030 anchors by Base / Bull cases.
vault/process/research-notes/2026-05-15-token-primer/SEMIANALYSIS-CPU-ASSESSMENT.mdvault/kb/outputs/models/ai-infra-capex/dc-capex-archetype-2026-05-25.xlsx (sheets 12_Agentic, 03_Archetype_Mix, 20_Average_DC_View)vault/kb/outputs/models/ai-infra-capex/bom-token-model-2026-05-21.xlsx (sheets Demand, Inputs, Triangulation; live recalc at vault/process/research-notes/2026-05-15-agentic-ai-primer/05-token-model-recalc.md)