Nvidia needs 1,000,000 x faster chips

Broadly Intelligent AI: Why Hardware Matters as Much as Algorithms

Artificial Intelligence has surged forward, but the next phase is less about clever algorithms and more about the hardware that makes true intelligence possible. To move from narrow, task-specific models to Broadly Intelligent AI, the industry must solve a scaling challenge that is both technical and economic.

How AI Inference Works Today

Every AI interaction follows the same loop:

Inputs → tokens — Speech, documents, and images are broken into tokens — numerical fragments a model can process. This step already uses GPU compute.
GPU inference — Tokens are run through the model on GPUs, where billions of parallel calculations predict the next token.
Context window — Tokens are temporarily stored in a context window — the model’s short-term working memory.
GPU output — The model’s prediction is converted back into text, responses, or structured outputs. CPUs format results, but GPUs provide the intelligence.

The Context Window Problem

Current limit: Models handle 4k–32k tokens — a few pages of text or a short conversation.
Need for Broad AI: Broadly Intelligent AI requires millions of tokens — the ability to reason across books, sustained conversations, or months of data.
The failure mode: Once the window fills, older tokens are dropped. The AI forgets, preventing broad reasoning across long timelines.

Scaling Is Exponential

Expanding the context window isn’t linear — it’s exponential:

Attention compares every token with every other → O(n²) scaling.
10× bigger context → 100× more compute.
1,000× bigger context → 1,000,000× more compute.

This strains every part of the stack:

Memory: from GBs today → TBs tomorrow.
Bandwidth: from ~1 TB/s → multi-TB/s.
Compute: requires thousands of times more parallelism.
Interconnects: GPUs must share data at unprecedented speed.

Possible Compromises

Researchers are testing fuzzy memory architectures:

Recent tokens stored with precision.
Older tokens compressed or approximated.

Benefit: reduced compute and memory cost.
Risk: weaker accuracy, introducing human-like “misremembering.”
Verdict: more research needed before this underpins Broad AI.

Market Landscape

NVIDIA: Iterative GPU scaling, faster interconnects, denser memory.
Cerebras: Wafer-scale processors with massive on-chip compute and memory.
Startups: Groq, Tenstorrent, Graphcore — each targeting efficiency and latency.
Memory innovators: HBM, stacked DRAM, photonics — essential to breaking bandwidth ceilings.

Investor Takeaways

Context scaling is the bottleneck — it is the gating factor to Broad AI.
Exponential scaling curve — costs grow faster than most forecasts assume.
Winners bend the curve — whoever delivers exponential capacity without exponential cost will dominate.
Opportunities extend beyond models — hardware, semiconductors, and memory will define the landscape as much as software.

Looking Forward

The road to Broadly Intelligent AI demands:

Larger, affordable context windows.
Smarter memory architectures balancing accuracy and efficiency.
Hardware leaps — wafer-scale, photonics, stacked memory — not just incremental GPU upgrades.

The open question: will NVIDIA’s scaling, Cerebras’ radical design, or a yet-unseen breakthrough define the future? What is certain is that Broadly Intelligent AI will not emerge from software alone. Hardware evolution is equally critical.

Discover more from Priory House

Subscribe to get the latest posts sent to your email.