Frankly, now I don’t automatically believe AI’s answers anymore…

Has GPT-5 Already Hit Its IQ Ceiling?
TL;DR: I asked GPT-5 to create some IQ-130 questions. It came back with three that all failed the most basic test of validity — one had two correct answers, one was impossible to solve, and was unsolvable without another, hidden rule. If this is the best version of ChatGPT, its reliable reasoning may top out around an IQ-130 equivalent — strong for everyday use, but far from the flawless logic people imagine.
I Wasn’t Testing GPT-5 — At First
This didn’t begin as an experiment. I just wanted a few IQ tests designed to show my wife difficulty levels . I asked GPT-5 to produce 130 and 150 IQ level questions -to demonstrate the idea of IQ tests Instead I got something that made me pause — and then dig deeper.
What I Asked vs. What I Got
The questions arrived . At first glance, they looked legitimate. But when I tried solving them, they started to unravel.
Puzzle 1 – Did not have a unique answer
GPT-5’s question:
Five people—A, B, C, D, E—are seated in a row.
- A is to the left of C but not directly next to C.
- B is between D and C.
- E is at one end.
Who is in the middle seat?
What I found: There are two valid arrangements: ChatGPT5 agreed
- A D B C E → middle is B
- E A D B C → middle is D
Two answers means it fails the one-solution rule that IQ tests demand.
I gave ChatGPT a chance to rework the question – It failed again and was Impossible to Solve
What I asked: “Now rewrite it so there’s only one possible answer.”
GPT-5’s question:
Five people—A, B, C, D, E—are seated in a row.
- E is at one end.
- A is to the left of C, exactly two seats away.
- B is immediately between D and C.
- D is not at either end.
What I found: The conditions contradict each other. There’s no seating arrangement that satisfies them all. Yet GPT-5 presented it as a solvable puzzle.
I then requested an IQ 150 level set of 3 questions – One failed
GPT-5’s question: A sequence that looked fine but required an unstated extra rule to get to GPT-5’s answer. Without guessing that hidden step, multiple answers fit — a direct breach of the principle that IQ tests must be fully self-contained.
The Pattern
In all three cases, GPT-5 produced puzzles that looked correct but failed under close scrutiny.
- Ambiguity: More than one answer fits.
- Contradiction: No answer exists.
- Omission: An essential rule is missing.
These are exactly the flaws that a competent human puzzle-setter — even at an average IQ — would usually avoid. Yet GPT-5 missed them.
What This Suggests About GPT-5’s “IQ”
This isn’t about trivia or specialist knowledge — these were reasoning problems. GPT-5 handled simple logic fine, but above a certain complexity, the cracks showed.
The emerging picture is a reliable reasoning ceiling around IQ 130 equivalent.
- At this level: GPT-5 can manage most everyday reasoning, pattern spotting, and problem-solving.
- Beyond this: The risk of hidden flaws, contradictions, or overconfident wrong answers rises sharply.
Why This Matters
If GPT-5 is the “best” version yet and still struggles beyond IQ-130-equivalent reasoning, that matters because:
- In research: Subtle flaws can mislead entire projects.
- In law, engineering, strategy: Confidently wrong answers can be costly.
- In public trust: People assume ChatGPT “must be right” because it sounds right.
The problem isn’t just that GPT-5 makes mistakes — humans do too. It’s that it doesn’t recognise when a problem is flawed. It will confidently produce an answer to a question that can’t be answered at all.
Bottom line: I didn’t set out to measure GPT-5’s IQ. But after three broken puzzles in a row, I’m left with a question: If it struggles above IQ 130, how much can we really trust it for deep, critical thinking?
Discover more from Priory House
Subscribe to get the latest posts sent to your email.