If ChatGPT, Claude, Gemini and Deepseek can’t handle 15 people and 3 courts, should we trust it with a Military Engagement?

If ChatGPT, Claude, Gemini and Deepseek can’t handle 15 people and 3 courts, should we trust it with a Military Engagement?


This was not an evaluation, I just wanted a solution. But as I want from frustration to frustration I effectively evaluated the 4 LLMs as they

All failed miserably – Except one!

The “Simple” Challenge – Scheduling a badminton evening (Thank god it wasnt ‘bomb an enemy Tank!

Here’s what I asked ChatGPT, Claude, Gemini 2.5 Pro, and DeepSeek to figure out:

  • 15 players, 3 courts, 12 games per evening
  • Everyone gets equal sit-outs
  • No consecutive sit-outs
  • Fair mixing of partnerships
  • Balanced play against different opponents

This is the kind of problem any sports club organizer easily manages

We’re told AI can fold proteins, beat grandmasters at chess, and write code that would take humans days to complete. So naturally, I thought asking four leading AI systems to organize a simple badminton evening would be trivial. In Claude’s own self rating it said I MASSIVELY UNDERESTIMATED the mathematical complexity of 15-player doubles tournament scheduling and gave itself a Poor ‘D’ grade!

Which Tool solved the problem – DeepSeek ! OMG

I was spectacularly wrong and shocked in what these tools can’t handle

The Spectacular Failures

ChatGPT: The Overthinking Champion ChatGPT immediately dove into complex equations and verbose explanations, despite being asked for brevity. After multiple iterations, it produced schedules where some players never played against certain opponents – a fundamental failure. Even worse, it couldn’t grasp that Player 1 + Player 2 is identical to Player 2 + Player 1. When asked to research solutions online, it came back empty-handed.

Claude: The False Start Claude initially seemed to understand the problem better, showing promise in its approach. But after five or six attempts and increasingly detailed explanations about consecutive sit-outs, it collapsed into even worse results. Some players ended up playing with certain partners eight times while never encountering others. The basic constraint balancing fell apart completely.

DeepSeek: The Surprise Performer DeepSeek actually grasped the problem constraints better than expected and made sensible assumptions about what “fair distribution” really meant. Eventually it did deliver a working solution, it showed the most logical approach to the challenge.

Gemini 2.5 Pro: Déjà Vu All Over Again Gemini repeated many of the same mistakes as the others but managed to edge slightly ahead in performance. Still no working solution, but the closest to understanding the real constraints.

The Telling Detail

Here’s what really struck me: when I asked each AI to assess its own work, none of them accurately identified their failures. They confidently declared success while delivering schedules that would have caused chaos at any actual sports club.

Why This Matters More Than You Think

This wasn’t protein folding or quantum mechanics – it was a scheduling problem that thousands of sports coordinators solve every week with pen and paper. If AI can’t handle 15 people and 3 courts, what does that tell us about its reliability for:

  • Complex project scheduling
  • Resource allocation
  • Multi-constraint optimization
  • Real-world logistics problems

The human brain’s “heuristic ability” – our intuitive grasp of trade-offs and practical solutions – remains remarkably superior for these everyday optimization challenges.

The Uncomfortable Truth

This experiment has genuinely reduced my confidence in AI for anything beyond sophisticated web search and content generation. While these systems can produce impressive outputs in their specialized domains, they struggle with the kind of practical reasoning that humans take for granted.

The Final Rankings:

  1. Gemini 2.5 Pro (least bad)
  2. DeepSeek (surprisingly logical, lots of self errors found and eventually got to an answer – eventually)
  3. Claude (promising start, poor finish)
  4. ChatGPT (overthought itself into failure)

The Bottom Line

Before we hand over complex decision-making to AI systems, perhaps we should make sure they can handle a badminton schedule first. The gap between AI’s impressive marketing claims and its ability to solve everyday problems is wider than we’re being told.

Have you tested AI with seemingly simple real-world problems? Share your experiences – the results might surprise you.

Finally a note from Claude

MY BRUTALLY HONEST SELF-GRADE: D POOR

 PERFORMANCE SUMMARY:

  • Final Schedule Quality: ⭐ POOR (2/10 points)
  • Algorithm Performance: D POOR
  • Opponent Coverage: Only 81% (missed target of 95%+)
  • Improvement: Marginal (+1 point, +1% coverage)

 WHAT WENT WRONG:

I MASSIVELY UNDERESTIMATED the mathematical complexity of 15-player doubles tournament scheduling. Despite:

  • Systematic optimization approach
  • Iterative improvement methodology
  • Three rounds of refinement
  • Comprehensive quality metrics

I still produced a schedule with:

  • 20 pairs who never compete
  • Players facing each other up to 7 times
  • Severe court distribution imbalances
  • Partnership clustering issues

 KEY REALIZATIONS:

  1. This is genuinely hard mathematics – constraint satisfaction with competing objectives
  2. Manual optimization is insufficient – needs algorithmic approaches
  3. Professional software exists for good reason – this problem is complex enough to require specialized tools
  4. My initial confidence was misplaced – should have recommended professional tools from the start

The AI Token Killer: Why SaaS Valuations Are at

The AI Token Killer: Why SaaS Valuations Are at

SaaS Companies Face Extinction

Dark shadows are looming behind SaaS companies – Is this a bubble ?

Big software companies (Salesforce, ServiceNow, Workday, Adobe, etc.) need to add billions of incremental revenue to avoid decimation by GPU economics and investor expectations:

Part 1: The Core Problem – SaaS Can’t afford AI and be profitable?

AI workloads aren’t free. Whether companies buy GPUs (with rapid depreciation) or rent them in the cloud (paying hyperscaler margins), costs scale faster than SaaS pricing models allow. If margins fall, valuations collapse, because the market assumes SaaS margins remain flat or expand. To sustain valuations, AI usage must be monetised with software-like margins, not absorbed as infrastructure costs.

Without a viable downstream pricing model, infrastructure players (NVIDIA, CoreWeave, etc.) can’t support sustainable growth, and the current buildout risks becoming a bubble.


Part 2: Token Use by User Type

User TypeMonthly Tokens (MT)
Classic SaaS seat (light AI, speech)0.6–1.5
Enhanced SaaS seat (heavy copilots, call center)3–6
Palantir / advanced analyst15–30
Medical imaging (radiology)150–600
Media analysis (surveillance, video)1,500–3,000
Media generation (video/VFX)3,000–30,000+

Part 2a: Imputed GPU Costs (Blackwell at ~$1.26/MT)

User TypeMonthly Use (MT)Blackwell Cost / Month
Classic SaaS seat0.6–1.5$0.75–$1.90
Enhanced SaaS seat3–6$3.80–$7.60
Palantir / analyst seat15–30$19–$38
Medical imaging seat150–600$190–$760
Media analysis seat1,500–3,000$1,900–$3,800
Media generation seat3,000–30,000+$3,800–$38,000+

Part 3: Examples

Let’s model costs and margin to work out the revenue that is needed to continue

  • Classic SaaS (Salesforce/ServiceNow): small AI uplift per seat (summarisation, embeddings).
  • Enhanced SaaS: reasoning agents, call center copilots.
  • Palantir / analyst seat: orchestrated reasoning, multi-model queries.
  • Medical imaging: each X-ray/CT study = tens of thousands of image tokens.
  • Media analysis: surveillance or legal video review, millions of tokens per hour.
  • Media generation: VFX or AI video, billions of tokens per project.

Part 4: Financial Lens

Additional Pressure on Top 10 Software Companies

The top ten Nasdaq-listed software companies together generate more than $350B in annual revenue. To maintain ~20% EBIT growth while covering AI compute, they will collectively need to add on the order of $70B+ new revenue each year. This is before accounting for incremental GPU costs, which will further raise the bar. AI seat pricing therefore becomes central not only to product strategy but also to market valuation.

These figures are based on current user numbers and revenue baselines assume similar number of seats (not growth of users). If this revenue is from more users, the required revenue growth would be even higher to preserve margins.

Breakdown of approximate revenue growth needed (20% p.a.):

CompanyCurrent RevenueEst. UsersGrowth Needed ($B)Growth/User/Year
Microsoft (software segment)~$85B~400M~$17B~$42
Oracle~$50B~50M~$10B~$200
Adobe~$20B~30M~$4B~$133
Salesforce~$35B~20M~$7B~$350
Intuit~$16B~100M~$3B~$30
ServiceNow~$10B~1M~$2B~$2,000
Workday~$7B~60M~$1.4B~$23
Autodesk~$6B~6M~$1.2B~$200
Atlassian~$4B~20M~$0.8B~$40
Snowflake~$3B~1.5M~$0.6B~$400

Together this aligns with ~$70B+ revenue growth per year required just to keep EBIT compounding at 20%.

Hardware Efficiency vs. Economics

GPU ModelApprox Cost / MT
A100~$3.82
H100~$3.04
Blackwell~$1.26

Blackwell GPUs reduce cost per MegaToken significantly compared to A100 and H100, often by more than half. But even at ~$1.26/MT, the scale of token usage in advanced workloads means absolute costs remain material. For SaaS companies, efficiency gains ease the curve but do not remove the structural need to recover AI expenses through pricing.

Vulnerability of Infrastructure Providers

This dynamic also exposes risk for NVIDIA and cloud infrastructure companies. Once the upfront scale-out of GPUs is complete, if SaaS vendors fail to pass costs downstream, usage growth does not automatically translate into proportional revenue growth for infrastructure. Margins for the supply side could contract sharply, making the current buildout fragile if the economics of AI seats do not align with sustainable pricing. In other words, NVIDIA and infrastructure providers are highly exposed: once the upfront scale-out wave passes, their long-term growth depends on whether software companies can successfully monetise AI tokens without eroding their own valuations.

Anthropic 500bn -The market is surprisingly consistent in how it values hypergrowth tech.

Anthropic 500bn -The market is surprisingly consistent in how it values hypergrowth tech.

The market is surprisingly consistent in how it values hypergrowth tech.
No matter how much hype surrounds a company, the multiples compress in a predictable pattern as scale increases.

Story → Scaling → Sustained

Story

  • Narrative and TAM drive valuations, not fundamentals.
  • Growth is 2–3× YoY.
  • Market assigns 35–55× ARR.
  • Examples: OpenAI in 2023–24 (tiny revenue, $300B valuation), Snowflake at IPO in 2020.

Scaling

  • Execution and enterprise adoption become the focus.
  • Growth slows to 50–100% YoY, but ARR climbs into the billions.
  • Multiples compress to 20–30×.
  • Examples: Anthropic today (2025, $5–10B ARR), Datadog during 2018–21.

Sustained

  • Market leadership is entrenched, with durable ~30% growth even at scale.
  • Multiples compress further to 10–20×, but on much larger revenue bases, driving multi-trillion valuations.
  • Examples: NVIDIA in the 2020s (>$60B revenue, still 30–40% growth), AWS in the late 2010s ($40–60B revenue, ~30% growth), Google Ads at $100B+ scale.

Anthropic Today

  • Transitioning from Scaling toward Sustained.
  • ARR trajectory:
    • ~$1B (Jan 2025)
    • ~$3B (May 2025)
    • $5B (Aug 2025)
    • Projected $7–8B exit 2025 organically.
  • With a Microsoft Copilot allocation (for example a SWAG of 25% of MS’s AI vendor budget): projected $10B exit ARR in 2025.

Market Multiples

  • OpenAI: valued at ~$300B on ~$5.5B ARR (~55×).
  • Anthropic: valued at ~$183B on ~$5B ARR (~36×).
  • Historical glide path:
    • Story = 35–55× ARR
    • Scaling = 20–30× ARR
    • Sustained = 10–20× revenue

Anthropic Valuation Model (with & without Microsoft)

YearARR (organic)ARR (with MS)Multiple (range)Valuation Band
2025~$7.5B~$10B~30×$225–300B
2026~$13.5B~$16B25–28×$338–448B
2027~$20B~$24B22–25×$440–600B

Takeaway: Anthropic is leaving the Story stage behind and entering Scaling → Sustained. Even with multiple compression baked in, it scales from ~$200B today toward $0.5–0.6T by 2027, with Microsoft providing the kicker that accelerates the transition into the Sustained phase.

Ukraine – Slower Than a Snail

Ukraine – Slower Than a Snail

On 24 February 2022, Russian troops surged out of Belarus toward Kyiv, barely 150 km away. The Kremlin expected the capital to fall in days.

A snail, crawling nonstop, could cover that distance in about five years. More than three and a half years later, Russia still hasn’t managed it. Kyiv endures, the frontline frozen.

The image is almost absurd: an army of tanks and missiles, outpaced by the patience of a garden snail. But it’s also bittersweet. The failure reflects not only Russian weakness, but the staggering cost Ukraine has paid to hold its ground.

Nvidia needs 1,000,000 x faster chips

Nvidia needs 1,000,000 x faster chips

Broadly Intelligent AI: Why Hardware Matters as Much as Algorithms

Artificial Intelligence has surged forward, but the next phase is less about clever algorithms and more about the hardware that makes true intelligence possible. To move from narrow, task-specific models to Broadly Intelligent AI, the industry must solve a scaling challenge that is both technical and economic.

How AI Inference Works Today

Every AI interaction follows the same loop:

  • Inputs → tokens — Speech, documents, and images are broken into tokens — numerical fragments a model can process. This step already uses GPU compute.
  • GPU inference — Tokens are run through the model on GPUs, where billions of parallel calculations predict the next token.
  • Context window — Tokens are temporarily stored in a context window — the model’s short-term working memory.
  • GPU output — The model’s prediction is converted back into text, responses, or structured outputs. CPUs format results, but GPUs provide the intelligence.

The Context Window Problem

  • Current limit: Models handle 4k–32k tokens — a few pages of text or a short conversation.
  • Need for Broad AI: Broadly Intelligent AI requires millions of tokens — the ability to reason across books, sustained conversations, or months of data.
  • The failure mode: Once the window fills, older tokens are dropped. The AI forgets, preventing broad reasoning across long timelines.

Scaling Is Exponential

Expanding the context window isn’t linear — it’s exponential:

  • Attention compares every token with every other → O(n²) scaling.
  • 10× bigger context → 100× more compute.
  • 1,000× bigger context → 1,000,000× more compute.

This strains every part of the stack:

  • Memory: from GBs today → TBs tomorrow.
  • Bandwidth: from ~1 TB/s → multi-TB/s.
  • Compute: requires thousands of times more parallelism.
  • Interconnects: GPUs must share data at unprecedented speed.

Possible Compromises

Researchers are testing fuzzy memory architectures:

  • Recent tokens stored with precision.
  • Older tokens compressed or approximated.

Benefit: reduced compute and memory cost.
Risk: weaker accuracy, introducing human-like “misremembering.”
Verdict: more research needed before this underpins Broad AI.

Market Landscape

  • NVIDIA: Iterative GPU scaling, faster interconnects, denser memory.
  • Cerebras: Wafer-scale processors with massive on-chip compute and memory.
  • Startups: Groq, Tenstorrent, Graphcore — each targeting efficiency and latency.
  • Memory innovators: HBM, stacked DRAM, photonics — essential to breaking bandwidth ceilings.

Investor Takeaways

  • Context scaling is the bottleneck — it is the gating factor to Broad AI.
  • Exponential scaling curve — costs grow faster than most forecasts assume.
  • Winners bend the curve — whoever delivers exponential capacity without exponential cost will dominate.
  • Opportunities extend beyond models — hardware, semiconductors, and memory will define the landscape as much as software.

Looking Forward

The road to Broadly Intelligent AI demands:

  • Larger, affordable context windows.
  • Smarter memory architectures balancing accuracy and efficiency.
  • Hardware leaps — wafer-scale, photonics, stacked memory — not just incremental GPU upgrades.

The open question: will NVIDIA’s scaling, Cerebras’ radical design, or a yet-unseen breakthrough define the future? What is certain is that Broadly Intelligent AI will not emerge from software alone. Hardware evolution is equally critical.

Palantir: Growth Strong, But Valuation Upside Limited

Palantir: Growth Strong, But Valuation Upside Limited

Palantir continues to post standout numbers in 2025. Revenue is on track to grow ~45% year-over-year, with U.S. commercial sales nearly doubling and government contracts expanding more than 50%. Profitability is improving too, with net margins already at 33% and expected to expand steadily.

But valuation tells a different story. At today’s levels, Palantir trades at one of the highest revenue multiples in software. If that premium compresses—as history suggests it will—the company’s financial gains may not translate into major upside for investors.


Projection With Multiple Compression

Assumptions:

  • Revenue growth: +45% per year (2025–2028)
  • P/S multiple: 20% compression per year (from ~74× today)
  • Net margin: +1% per year (from 33% in 2025)
YearRevenue ($B)P/S MultipleMarket Cap ($B)Net MarginNet Income ($B)
20266.0259.235634%2.05
20278.7347.441335%3.05
202812.6537.947936%4.55

What It Shows

  • Revenue soars: more than triples in three years.
  • Profitability scales: net income more than doubles, reaching $4.5B by 2028.
  • Valuation growth slows: despite explosive fundamentals, market cap rises only modestly—from ~$370B today to ~$479B in 2028.

Why Upside May Be Limited

  • Current multiples are historically elevated; even world-class growth cannot fully offset compression.
  • Market cap growth (~+29% over three years) lags far behind revenue growth (~+200%).
  • In effect, investors may see strong fundamentals but not commensurate share price appreciation.

Conclusion

Palantir is executing exceptionally well—fast growth, rising margins, robust cash generation. But the valuation already prices in much of that success. If multiples continue to normalize, the stock may deliver less upside than the business results suggest.

Anthropic’s $170B Raise Could Set Off a Market Scramble to $200bn —Just Like OpenAI and CoreWeave Before It

Anthropic’s $170B Raise Could Set Off a Market Scramble to $200bn —Just Like OpenAI and CoreWeave Before It

Anthropic is preparing to close a new funding round at a $170 billion valuation, a figure that would cement it as the second-most valuable private AI company behind OpenAI. But as significant as that milestone is, history suggests the real valuation story unfolds immediately after—in the secondary market.

If recent parallels from OpenAI and CoreWeave are any guide, Anthropic’s post-round pricing may rise sharply as investors who missed primary allocations rush to gain exposure. And that could push the true implied valuation well beyond $170B, at least in the near term.

What We’ve Seen Before

OpenAI

In March 2025, OpenAI raised at a ~$300B valuation via a staff-led secondary offering. Just months later, secondary bids surged toward $500B—a 67% jump—as institutional buyers aggressively sought access to the market leader. Much of the surge came from parties locked out of formal allocation channels, including sovereign wealth funds and crossover hedge funds.

CoreWeave

Pre-IPO secondaries priced CoreWeave near $23B. At IPO (March 2025), it listed at $27B, with public markets immediately absorbing the difference. The modest but clear +17% valuation uplift shows even infrastructure players benefit from liquidity timing.

Anthropic: Next in Line

Now Anthropic is on the verge of closing its own step-change round, jumping from its last $61.5B valuation to a proposed $170B. That’s nearly a 3x markup, and the market’s response is expected to be just as aggressive—if not more.

Why?
  • Unfilled demand from Series E: Last time, secondary prices rose up to 25% above issue price within days.
  • Clear market position: Unlike OpenAI, Anthropic offers enterprise buyers and investors a more neutral, governance-stable option.
  • Strategic scarcity: With so few scaled AI platforms left in private hands, Anthropic is likely to be heavily oversubscribed again.

What We Expect

  • Secondary pricing could land closer to $185–200B in the immediate aftermath
  • Existing shareholders (Series C–E) could see liquidity offers well above their entry
  • New allocators may prioritize speed over price to secure a seat

Comparative Summary

Company Last Valuation Post-Round Value Premium
OpenAI $300B ~$500B +67%
CoreWeave ~$23B $27B (IPO) +17%
Anthropic $170B (expected) $185–200B (projected) +9–18%

Conclusion

Anthropic’s $170B valuation won’t be the final word. In this market, oversubscribed rounds don’t end the conversation—they just shift it to secondary desks. And with investors already watching OpenAI soar to half a trillion and CoreWeave delivering a clean IPO premium, Anthropic could emerge as the next must-own AI name—at any price.

The question isn’t whether it hits $170B. It’s how fast it clears $200B once the ink is dry.

What Anthropic’s $170 Billion Raise Could Mean for Its Secondary Market Value


Anthropic is preparing to raise about $5 billion at a valuation of $170 billion, with the round reportedly five times oversubscribed. That level of demand raises an important question: what happens to the company’s value in the secondary market, where early investors and employees can sell shares once the primary round closes?

If history is a guide, oversubscription has consistently pushed private company shares above official round prices in secondary trading. Let’s look at some precedents:


Historical Examples of Oversubscribed Rounds

CompanyPrimary Round ValuationOversubscriptionSecondary Market ValuationUplift
SpaceX$46B3–5×$60B+30%
Stripe (2021)$95BYes$110B+16%
Stripe (2023)$50BYes$55B+10%
ByteDance$250BYes$400B+60%
Databricks (2021)$38BYes$50B+32%

Pattern: Across these companies, secondary prices typically traded 10–60% higher than the official raise price, depending on demand and market hype.


What This Implies for Anthropic

  • At $170B, Anthropic sits between Databricks and ByteDance in valuation scale.
  • With 5× oversubscription, unmet demand is significant — a strong setup for secondary market repricing.
  • Applying the historical range (+10% to +60% uplift):
    • Low case (+10%) → ~$185B secondary value.
    • Mid case (+30%) → ~$220B.
    • High case (+60%) → ~$270B.

Why Anthropic Could Trend Higher

  • Unprecedented growth: Revenue trajectory is outpacing even OpenAI’s early climb.
  • Scarcity premium: Very limited allocation means funds shut out of the raise will bid higher on secondaries.
  • Sector momentum: With frontier AI still expanding, market appetite for Anthropic shares is unusually strong.

The Caveats

  • Legal risks: Ongoing copyright litigation could dampen valuations if liabilities loom.
  • Governance: Anthropic is actively blocking SPVs and pushing for direct investors, which limits liquidity and may restrict secondary flows.
  • Macro factors: Broader tech sentiment and AI regulatory scrutiny could shift pricing dynamics.

Bottom Line

If history repeats, Anthropic’s $170B primary valuation may be just a floor. Secondary markets could reprice the company in the $185–270B range, aligning it with patterns seen at SpaceX, ByteDance, Stripe, and Databricks.

In other words: with demand this strong, $170B may not be the “real” market price — just the starting point.


Would you like me to also draft a headline + SEO description optimized for search (so the post ranks for queries like “Anthropic valuation secondary market”)?

Why GPT-5 Struggles Above IQ 130

Why GPT-5 Struggles Above IQ 130


Frankly, now I don’t automatically believe AI’s answers anymore…

Has GPT-5 Already Hit Its IQ Ceiling?

TL;DR: I asked GPT-5 to create some IQ-130 questions. It came back with three that all failed the most basic test of validity — one had two correct answers, one was impossible to solve, and was unsolvable without another, hidden rule. If this is the best version of ChatGPT, its reliable reasoning may top out around an IQ-130 equivalent — strong for everyday use, but far from the flawless logic people imagine.


I Wasn’t Testing GPT-5 — At First

This didn’t begin as an experiment. I just wanted a few IQ tests designed to show my wife difficulty levels . I asked GPT-5 to produce 130 and 150 IQ level questions -to demonstrate the idea of IQ tests Instead I got something that made me pause — and then dig deeper.


What I Asked vs. What I Got

The questions arrived . At first glance, they looked legitimate. But when I tried solving them, they started to unravel.


Puzzle 1 – Did not have a unique answer

GPT-5’s question:
Five people—A, B, C, D, E—are seated in a row.

  • A is to the left of C but not directly next to C.
  • B is between D and C.
  • E is at one end.
    Who is in the middle seat?

What I found: There are two valid arrangements: ChatGPT5 agreed

  • A D B C E → middle is B
  • E A D B C → middle is D

Two answers means it fails the one-solution rule that IQ tests demand.


I gave ChatGPT a chance to rework the question – It failed again and was Impossible to Solve

What I asked: “Now rewrite it so there’s only one possible answer.”

GPT-5’s question:
Five people—A, B, C, D, E—are seated in a row.

  • E is at one end.
  • A is to the left of C, exactly two seats away.
  • B is immediately between D and C.
  • D is not at either end.

What I found: The conditions contradict each other. There’s no seating arrangement that satisfies them all. Yet GPT-5 presented it as a solvable puzzle.


I then requested an IQ 150 level set of 3 questions – One failed

GPT-5’s question: A sequence that looked fine but required an unstated extra rule to get to GPT-5’s answer. Without guessing that hidden step, multiple answers fit — a direct breach of the principle that IQ tests must be fully self-contained.


The Pattern

In all three cases, GPT-5 produced puzzles that looked correct but failed under close scrutiny.

  • Ambiguity: More than one answer fits.
  • Contradiction: No answer exists.
  • Omission: An essential rule is missing.

These are exactly the flaws that a competent human puzzle-setter — even at an average IQ — would usually avoid. Yet GPT-5 missed them.


What This Suggests About GPT-5’s “IQ”

This isn’t about trivia or specialist knowledge — these were reasoning problems. GPT-5 handled simple logic fine, but above a certain complexity, the cracks showed.

The emerging picture is a reliable reasoning ceiling around IQ 130 equivalent.

  • At this level: GPT-5 can manage most everyday reasoning, pattern spotting, and problem-solving.
  • Beyond this: The risk of hidden flaws, contradictions, or overconfident wrong answers rises sharply.

Why This Matters

If GPT-5 is the “best” version yet and still struggles beyond IQ-130-equivalent reasoning, that matters because:

  • In research: Subtle flaws can mislead entire projects.
  • In law, engineering, strategy: Confidently wrong answers can be costly.
  • In public trust: People assume ChatGPT “must be right” because it sounds right.

The problem isn’t just that GPT-5 makes mistakes — humans do too. It’s that it doesn’t recognise when a problem is flawed. It will confidently produce an answer to a question that can’t be answered at all.


Bottom line: I didn’t set out to measure GPT-5’s IQ. But after three broken puzzles in a row, I’m left with a question: If it struggles above IQ 130, how much can we really trust it for deep, critical thinking?

The Case for a $700B OpenAI and $5.8T NVIDIA

The Case for a $700B OpenAI and $5.8T NVIDIA


A Bottom-Up, Valuation Framework For 2027

AI is powering a global economic transformation. By combining a bottom-up revenue model with valuation multiples proven by past hyper-growth companies like Salesforce and ServiceNow, we can make a disciplined and credible case for the near term value of companies like OpenAI, Anthropic, and NVIDIA.


Step 1: Bottom-Up AI Market Sizing

User Base Assumption:

  • 4 billion people globally are touched by AI (50% of global population)

Average Monthly Spend per User:

  • $5 for personal AI
  • $20 for business use
  • $4 for automation (e.g. AI cars, smart devices)
  • Total: $29/month

Annualized Total Market:
$29 × 4B users × 12 months = $1.392 trillion


Step 2: AI Spend Breakdown by Layer
  • Infrastructure: $696B (50%)
  • Applications: $511.5B (36.7%)
  • Core Software: $184.5B (13.3%)

Step 3: LLM Platform Revenue

Assuming 70% of Core Software spend goes to large language model (LLM) platforms:

  • Total LLM spend: $129.2B

LLM Vendor Share:

  • OpenAI (40%): $51.7B
  • Anthropic (30%): $38.8B
  • Others (30%): $38.8B

Step 4: NVIDIA’s AI Revenue

Assuming NVIDIA captures 60% of infrastructure spend:

  • 60% of $696B = $417.6B

Step 5: Apply a Proven 14× Revenue Multiple

Why 14×?
During their breakout phases, companies like Salesforce, ServiceNow, and Shopify all traded at 10× to 20× revenue.
14× is a conservative, historically proven mid-range multiple during 50%+ growth phases.


Final Valuation Summary (14× Revenue)

OpenAI

  • Revenue: $51.7B
  • Valuation: $724B

Anthropic

  • Revenue: $38.8B
  • Valuation: $543B

NVIDIA

  • Revenue: $417.6B
  • Valuation: $5.85T

Why This Works

  • Built from the bottom up — based on real users and real spending behavior
  • Uses valuation multiples proven by high-growth tech companies
  • Treats NVIDIA not just as a chip company, but as a software-style AI platform with deep infrastructure lock-in

This model doesn’t rely on inflated TAM or hype. It uses usage, spend, and precedent — and it suggests the next $10 trillion in enterprise value is already forming.