3 min read | For investors tracking NVDA, hyperscalers, and AI infrastructure

Emerging AI architectures — particularly Mamba and State Space Models (SSMs) — threaten to disrupt the current AI landscape.
Both hardware and software leaders face risk, though hardware companies are more exposed: custom silicon takes 3-5 years to design, while model companies can retrain and pivot in 12-18 months.
The Game-Changer: Mamba and State Space Models
The transformer problem: Today’s AI models (GPT, Claude, Gemini) use “attention” — comparing every word to every other word. Double the context length, quadruple the compute.
That’s why long-context models are so expensive and why hyperscalers are spending hundreds of billions on GPU clusters.
The SSM solution: State Space Models process sequences like a running summary rather than looking at everything simultaneously.
Mamba — the leading SSM architecture, developed by researchers at Carnegie Mellon and Princeton — scales linearly with context length.
The impact:
✓ 5x faster on long sequences
✓ Fraction of the memory footprint
✓ Dramatically lower compute per token
Who’s backing it: Mistral (Codestral Mamba), AI21 Labs (Jamba), Together AI. Hybrid transformer-Mamba models are already in production.
The catch: Pure Mamba hasn’t matched transformers for complex reasoning at frontier scale — yet. But hybrids are proliferating, and the trajectory is clear.
What This Means for GPU Demand
If SSMs gain traction, the compute economics of AI change fundamentally.
Transformer world:
- Compute scales quadratically with context
- High memory per token
- GPU hours: baseline
- Long context: expensive, constrained
SSM world:
- Compute scales linearly with context
- Low memory per token
- GPU hours: potentially 50-80% less
- Long context: cheap, abundant
The demand question: Efficiency gains could reduce absolute GPU demand — or unlock new use cases that consume the savings. History suggests the latter, but the transition creates uncertainty.
The NVIDIA risk: SSMs still run on CUDA — but if you need 50-80% fewer GPUs per workload, that’s a volume problem even if NVIDIA keeps 100% share.
The $600 billion hyperscaler capex trajectory assumes insatiable compute demand. SSMs challenge that assumption.
The custom silicon risk: TPUs, Trainium, and MTIA are optimised for transformer attention patterns. SSMs have different compute profiles.
Billions in chips could become sub-optimal — and reduced compute demand means less need for any custom silicon.
Who’s Exposed, Who’s Protected
NVIDIA — Moderate exposure
Keeps architecture flexibility, but volume at risk if compute demand drops. Pricing power may erode.
Google, Amazon — High exposure
Custom silicon optimised for wrong architecture and potentially overbuilt for reduced demand. Double hit.
Microsoft — Hedged
NVIDIA dependency means less stranded silicon, but Azure capex still at risk.
Neo-Clouds (CRWV, NBIS) — High exposure
Leveraged to GPU demand. If demand drops 50%, the debt doesn’t.
OpenAI, Anthropic — Lower exposure
Compute costs drop. They’ll adopt what works. Net beneficiaries.
The Bottom Line
SSMs and Mamba may not kill transformers outright — hybrids are more likely.
But they rewrite the compute economics and challenge the “insatiable demand” narrative underlying current valuations.
NVIDIA keeps architectural flexibility but faces volume risk. Hyperscalers face both wrong-architecture risk and overbuilding risk. Neo-clouds are leveraged to a demand curve that may flatten.
The model companies may be the cleanest winners — their compute bills go down while their capabilities go up.
When efficiency improves dramatically, the biggest spenders have the most to lose.
Discover more from Priory House
Subscribe to get the latest posts sent to your email.