The Fist Glimpse of the Death of Nvidia

Conclusion First: LLM Companies Will Own Inference—And With It, the Rights to the Silicon $$

OpenAI’s vast monopoly has turned the tables on Nvidia.

Processing does not own LLM’s – LLM’s own Silicon!

The most important investor takeaway is this: Large Language Model (LLM) companies will not just dominate inference—they will own the rights to the silicon that powers it. In the same way Intel and ARM controlled instruction sets for decades, model owners like OpenAI, Anthropic, and Google now control the “instruction set” of AI: the closed model weights. No chip can run their models without their approval. This creates a monopoly dynamic with far greater lock-in than past hardware eras.

NVIDIA’s Blackwell GPUs are world-class for training, but they are the wrong long-term solution for inference. Inference accounts for 99% of AI compute today, and it can be executed more cheaply and efficiently on custom chips tied directly to model architectures. Blackwell is overbuilt for this role, leaving an opening that the model owners themselves are best positioned to fill.

Blackwell vs. Inference Chips: Efficiency at the Core of AI Deployment

NVIDIA’s Blackwell architecture (B200/B100) is unmatched for training and flexible enough for both training and inference. But in inference-only use cases, it wastes silicon and power on unused capabilities. In contrast, dedicated inference chips—like Tesla’s FSD chip, Apple’s Neural Engine, Google’s TPUv4i, or AWS’s Inferentia—are optimized for low-bit precision, simpler interconnect, and streamlined memory pipelines. The result is 3–10× higher performance per watt and significantly lower cost per inference.

This efficiency gap demonstrates why inference will migrate away from general-purpose GPUs toward custom silicon.

Inference Monopolies: How Model Owners Will Control the Future of AI Compute

Why Inference Will Dominate

Training is rare and centralized.
Inference is constant and scales with usage.
Economically, inference drives the overwhelming majority of AI market value.

Cost-Effective Silicon for Inference

Inference chips can be built on 5nm or 7nm, avoiding the costs of bleeding-edge nodes.
Apple, Tesla, and Samsung have proven custom inference silicon can be built for $150M–$300M.
Quantization and model compression make inference even more efficient at lower geometries.

The Strategic Shift: Models as the New Instruction Set

Historically, Intel and ARM monopolized by controlling instruction sets.
In AI, model weights are the instruction set—closed, copyrighted, and protected by license agreements.
This means the model owner alone controls the right to design inference hardware compatible with their models.
Competitors cannot copy or replicate without permission, creating a legal and technical lock-out.

Deployment Path

Centralized inference: Run in lab-controlled data centers (today’s norm).
Enterprise inference: Licensed chips deployed inside corporate data centers.
Edge inference: Personal Intelligence Engines (PIEs) embedded in user devices.

Investor Implications

Model owners (OpenAI, Anthropic, Google): Gain not only software monopoly but also a hardware monopoly, by controlling both the model and the silicon rights.
Chip vendors (NVIDIA, AMD): Training remains important but is a smaller, less scalable market. Their dominance weakens as inference shifts to model-specific chips.
Enterprises: Must license inference hardware, paying rent to model owners.
Consumers: AI engines run locally but still locked to the model maker’s ecosystem.

Final Takeaway for Investors
Training may still generate headlines, but inference is where the money and the control lie. And inference belongs to the model makers, who own the weights and thus own the rights to the silicon. This is a stronger monopoly than Intel or ARM ever achieved—legal, architectural, and economic. For investors, the key is clear: back the companies that own the models, because they will own the hardware market that runs them.

Discover more from Priory House

Subscribe to get the latest posts sent to your email.