OpenAI Could Become the World’s Biggest Chipmaker — And Dethrone NVIDIA

When the OpenAI Chips are down

The real leverage in AI doesn’t come from GPUs. It comes from who controls the models — and therefore who dictates the silicon those models must run on. That is the foundation modellers, they alone decide the Chips

OpenAI owns the most widely used closed models in the world. If it builds its own inference chips, those models will only run on OpenAI’s hardware. That flips the supply chain upside down. NVIDIA doesn’t set the rules anymore — the model company does. Developers and enterprises don’t get a choice of hardware; they’re locked into whatever silicon OpenAI decides to serve their APIs from.

That’s the second-order game. Training runs grab headlines, but they’re episodic. Inference is continuous, and it scales with every query to ChatGPT, every copilot suggestion, every agent embedded in a business workflow. If OpenAI locks inference to its own silicon, it captures not just the software margins but the hardware margins too — a vertical integration tighter than CUDA ever created. CUDA was sticky, but models could still be ported. Closed models bound to closed chips? That’s total captivity.

This is Apple’s chip strategy multiplied by orders of magnitude. By 2030, inference demand could support a $50–100 billion model API market. If OpenAI captures the silicon layer as well, it could seize an additional $15–25 billion annually that currently flows to NVIDIA and the cloud providers. In effect, the world’s most valuable AI company could also become its largest silicon vendor.

The first phase of the AI boom was about building models. The second phase is about defining the hardware those models run on. And because the model owners set the silicon requirements, companies like OpenAI have a clear path to unseat NVIDIA — not just as the leader in AI software, but as the dominant force in AI silicon.


AI’s Next Lock-In: Moving Inference to the Edge

The first phase of the silicon play is about model companies running their models on proprietary chips inside their own cloud clusters. That locks the cloud side of inference to their hardware.

The second phase is more ambitious: push inference out of the data center and into the edge.

Right now, every ChatGPT query or Claude response runs on centralized infrastructure. That creates two problems: latency, because every request travels across networks, and cost, because the model company is footing the compute bill. At global scale, those limits become critical.

Offloading inference to the edge solves both. If inference runs on enterprise servers or local devices — but only on silicon defined by the model owner — then three things happen:

  • Latency improves, because compute is local.
  • Costs shift, with enterprises and device vendors funding the hardware.
  • Lock-in deepens, since only certified chips can run the models.

This is the second-order business model. The model company still defines the stack, but now enterprises carry the capex. The provider keeps the margins while embedding its hardware into every environment where inference runs.

The bigger picture is that the cloud is only the start. The real opportunity is ensuring that wherever inference happens — in hyperscale farms, in corporate data centers, or at the edge — it runs only on silicon tied to the model owner.

That’s the next lock-in. And it could prove even bigger than the first.


Discover more from Priory House

Subscribe to get the latest posts sent to your email.