[Glass Wings] AI systems are great at tests. But how do they perform in real life?

<https://theconversation.com/ai-systems-are-great-at-tests-but-how-do-they-perform-in-real-life-260176>

"Earlier this month, when OpenAI released its latest flagship artificial
intelligence (AI) system, GPT-5, the company said it was “much smarter across
the board” than earlier models. Backing up the claim were high scores on a
range of benchmark tests assessing domains such as software coding, mathematics
and healthcare.

Benchmark tests like these have become the standard way we assess AI systems –
but they don’t tell us much about the actual performance and effects of these
systems in the real world.

What would be a better way to measure AI models? A group of AI researchers and
metrologists – experts in the science of measurement – recently outlined a way
forward.

Metrology is important here because we need ways of not only ensuring the
reliability of the AI systems we may increasingly depend upon, but also some
measure of their broader economic, cultural, and societal impact."

Cheers,
       *** Xanni ***
--
mailto:xanni@xanadu.net               Andrew Pam
http://xanadu.com.au/                 Chief Scientist, Xanadu
https://glasswings.com.au/            Partner, Glass Wings
https://sericyb.com.au/               Manager, Serious Cybernetics

AI systems are great at tests. But how do they perform in real life?

Tue, 26 Aug 2025 03:02:37 +1000

Andrew Pam <xanni [at] glasswings.com.au>