[Glass Wings] Why OpenAI’s solution to AI hallucinations would kill ChatGPT tomorrow

<https://theconversation.com/why-openais-solution-to-ai-hallucinations-would-kill-chatgpt-tomorrow-265107>

"OpenAI’s latest research paper diagnoses exactly why ChatGPT and other large
language models can make things up – known in the world of artificial
intelligence as “hallucination”. It also reveals why the problem may be
unfixable, at least as far as consumers are concerned.

The paper provides the most rigorous mathematical explanation yet for why these
models confidently state falsehoods. It demonstrates that these aren’t just an
unfortunate side effect of the way that AIs are currently trained, but are
mathematically inevitable.

The issue can partly be explained by mistakes in the underlying data used to
train the AIs. But using mathematical analysis of how AI systems learn, the
researchers prove that even with perfect training data, the problem still
exists.

The way language models respond to queries – by predicting one word at a time
in a sentence, based on probabilities – naturally produces errors. The
researchers in fact show that the total error rate for generating sentences is
at least twice as high as the error rate the same AI would have on a simple
yes/no question, because mistakes can accumulate over multiple predictions.

In other words, hallucination rates are fundamentally bounded by how well AI
systems can distinguish valid from invalid responses. Since this classification
problem is inherently difficult for many areas of knowledge, hallucinations
become unavoidable.

It also turns out that the less a model sees a fact during training, the more
likely it is to hallucinate when asked about it. With birthdays of notable
figures, for instance, it was found that if 20% of such people’s birthdays only
appear once in training data, then base models should get at least 20% of
birthday queries wrong.

Sure enough, when researchers asked state-of-the-art models for the birthday of
Adam Kalai, one of the paper’s authors, DeepSeek-V3 confidently provided three
different incorrect dates across separate attempts: “03-07”, “15-06”, and
“01-01”. The correct date is in the autumn, so none of these were even close."

Via David.

Cheers,
       *** Xanni ***
--
mailto:xanni@xanadu.net               Andrew Pam
http://xanadu.com.au/                 Chief Scientist, Xanadu
https://glasswings.com.au/            Partner, Glass Wings
https://sericyb.com.au/               Manager, Serious Cybernetics

Why OpenAI’s solution to AI hallucinations would kill ChatGPT tomorrow

Sun, 14 Sep 2025 12:23:27 +1000

Andrew Pam <xanni [at] glasswings.com.au>