We're already trusting AI with too much – I just hope AI hallucinations disappear before it's too late

We’re already trusting AI with too much – I just hope AI hallucinations disappear before it’s too late

March 24, 2025

I was talking to an old friend about AI – as one often does whenever engaging in causal conversation with anyone these days – and he was describing how he’d been using AI to help him analyze insurance documents. Basically, he was feeding almost a dozen documents into the system to summarize or maybe a pair of lengthy policies to compare changes. This was work that could take him hours, but in the hands of AI (perhaps ChatGPT or Gemini, though he didn’t specify), just minutes.

What fascinated me is that my friend has no illusions about generative AI’s accuracy. He fully expected one out of 10 facts to be inaccurate or perhaps hallucinated and made it clear that his very human hands are still part of the quality-control process. For now.

The next thing he said surprised me – not because it isn’t true, but because he acknowledged it. Eventually, AI won’t hallucinate, it won’t make a mistake. That’s the trajectory and we should prepare for it.

The future is perfect

I agreed with him because this has long been my thinking. The speed of development essentially guarantees it.

While I grew up with Moore’s Law, which posits a doubling of transistor capacity on a microchip roughly every two years, AI’s Law is, putting it roughly, a doubling of intelligence every three-to-six months. That pace is why everyone is so convinced we’ll achieve Artificial General Intelligence (AGI or human-like intelligence) sooner than originally thought.

I believe that, too, but I want to circle back to hallucinations because even as consumers and non-techies like my friend embrace AI for everyday work, hallucinations remain a very real part of the AI, Large Language Model (LLM) corpus.

In a recent anecdotal test of multiple AI chatbots, I was chagrinned to find that most of them could not accurately recount my work history, even though it is spelled out in exquisite detail on Linkedin and Wikipedia.

AI Hallucinations — ChatGPT had me working at a place I’ve never worked (left). DeepSeek couldn’t get the dates right (center), and ClaudeAI (right) also had timeline issues. (Image credit: Future)

These were minor errors and not of any real importance because who cares about my background except me? Still, ChatGPT’s 03-mini model, which uses deeper reasoning and can therefore take longer to formulate an answer, said I worked at TechRepublic. That’s close to “TechRadar,” but no cigar.

DeepSeek, the Chinese AI chatbot wunderkund, had me working at Mashable years after I left. It also confused my PCMag history.

Google Gemini smartly kept the details scant, but it got all of them right. ChatGPT’s 4o model took a similar pared-down approach and achieved 100% accuracy.

Claude AI lost the thread of my timeline and still had me working at Mashable. It warns that its data is out of date, but I did not think it was 8 years out of date.

What percentage of AI answers do you think are hallucinations?March 24, 2025

I ran some polls on social media about the level of hallucination most people expect to see on today’s AI platforms. On Threads, 25% think AI hallucinates 25% of the time. On X, 40% think it’s 30% of the time.

However, I also received comments reminding me that accuracy depends on the quality of the prompt and topic areas. Information that doesn’t have much of an online footprint is sure to lead to hallucinations, one person warned me.

However, research is showing that models are not only getting larger, they’re getting smarter, too. A year ago, one study found ChatGPT hallucinating 40% of the time in some tests.

According to the Hughes Hallucination Evaluation Model (HHEM) leaderboard, some of the leading models’ hallucinations are down to under 2%. Older models like Meta Llama 3.2 are where you can head back into double-digit hallucination rates.

Cleaning up the mess

What this shows us, though, is that these models are quickly heading in the direction my friend predicts and that at some point in the not-too-distant future, they will have large enough models with real-time training data that put the hallucination rate well below 1%.

My concern is that in the meantime, people without technical expertise or even an understanding of how to compose a useful prompt are relying on large language models for real work.

Hallucination-driven errors are likely creeping into all sectors of home life and industry and infecting our systems with misinformation. They may not be big errors, but they will accumulate. I don’t have a solution for this, but it’s worth thinking about and maybe even worrying about a little bit.

Perhaps, future LLMs will also include error sweeping, where you send them out into the web and through your files and have them cull all the AI-hallucination-generated mistakes.

After all, why should we have to clean up AI’s messes?

Source

I was talking to an old friend about AI – as one often does whenever engaging in causal conversation with anyone these days – and he was describing how he’d been using AI to help him analyze insurance documents. Basically, he was feeding almost a dozen documents into the system…