KPMG AI Hallucination: A Deep-Dive Product Teardown

KPMG published a report on AI excellence. Then UBS, the NHS, and Transport for London all said the same thing: that case study about us never happened. A report about AI had been written by an AI that made things up. Here is the teardown : and the one habit that stops it.

A press officer at UBS picked up a call from the Financial Times one morning in June. The reporter wanted to confirm something from a KPMG report: did UBS really run AI agents for investment advice, risk, and compliance on a platform it built with Microsoft? It sounded specific and real. But, it was completely made up. UBS had never built that platform. The same call went to the NHS in Greater Manchester, to Swiss Federal Railways, to Transport for London. Each one checked. Each one said the same thing: that never happened.

The report was titled “Total Experience: Redefining Excellence in the Age of Agentic AI.” It was meant to show the world how top institutions use AI. Instead it became a live demonstration of AI hallucination inside one of the most trusted names in professional services — and a warning for anyone who uses AI to produce work that carries their name.

💡 The core problem

This was not one bad document. It was what happens when a trusted institution lets AI write authoritative content faster than humans verify it. The hallucination was the symptom. The skipped verification step was the disease.

What the AI Hallucination Actually Did

This was not one stray error. The AI-detection group GPTZero reviewed the report and the Financial Times verified their findings. Of the report’s 45 citations, only five correctly pointed to their real source. The other 40 were broken: 28 paraphrased or added fake details to real sources, and 12 were too vague to verify at all. Roughly half of the report’s factual claims were false, unsupported, or pinned to the wrong source.

The made-up case studies read beautifully. That is the trap. One claimed Emirates launched an AI chatbot called “Sara” that could talk to passengers and change their flights. Sara was a real mobile assistant from 2023 — but it was not AI-powered and could not change any bookings. A real thing, reshaped into something it never was. The report even contradicted KPMG’s own data: it cited a figure of 55% of CEOs ranking AI as their top priority, while KPMG’s own 2025 CEO Outlook, published the same month, said 71%.

GPTZero gave it a name that will stick: “vibe citing.” It is the citation version of vibe coding — AI stitching together fragments of real sources and inventing references that look convincing until someone actually clicks them. KPMG pulled the report and opened a review. But it had already been cited by other publications first. The fabrication got a head start.

Why This AI Hallucination Happened

It is easy to blame the AI model. That is the wrong answer. The model did exactly what models do — and understanding that is the whole lesson.

A language model does not know what is true. It produces text that sounds plausible. Ask it for examples of companies using AI agents, and it will confidently generate examples that sound exactly like real ones — names, platforms, partnerships, all in fluent corporate language. When it invented the UBS-Microsoft platform, it was not malfunctioning. It was doing its job: producing convincing text. Deciding whether that text is true is a separate job — and that job belongs to a human. In this workflow, that human step did not happen. As GPTZero’s researcher put it bluntly: they suspect no human at KPMG double-checked the citations before publishing.

Here is the deeper trap. The fabricated case studies were well-written. When AI output is polished, confident, and specific, it triggers automatic trust — surely something this professional must be correct. But in a language model, fluency and accuracy are unrelated. The UBS claim was specific, authoritative, and entirely false. A reviewer skimming for quality would have approved it. Only a reviewer checking each claim against its source would have caught the fiction. KPMG’s process checked for the first and skipped the second.

And there is one more cause, the quiet one. The old way of writing a report was slow — you found a real example, called the company, confirmed it, cited it. That slowness was not waste. It was the verification, built into the speed. AI removes the slowness, and with it, the built-in checking. When you can produce 45 citations in minutes, you also get the temptation to publish them in minutes. The friction that used to protect accuracy is gone — and nothing was put in its place.

What You Actually Want From AI

If your team uses AI to write anything — reports, decks, articles, proposals — you probably want what KPMG wanted. The speed of AI. The polish of AI. Content produced in hours instead of weeks. That is a reasonable thing to want, and AI genuinely delivers it.

But you also want something KPMG forgot to protect: the certainty that your name only goes on things that are true. You want the speed without the fabrication. The output without the embarrassment. And the good news is you can have both — but only if you add back the one thing AI quietly removed.

⚠️ Why a Big Four name makes it worse

CFOs use Big Four reports to benchmark their own AI strategy. When the benchmark is hallucinated, every decision downstream inherits the error. GPTZero’s CEO put it well: false claims from highly trusted institutions “pollute the soil of information” — they spread before corrections catch up.

How to Stop This Happening to You

The KPMG failure is completely preventable. Three habits stop it — and none of them require new technology.

1. Make verification a gate, not a guideline. KPMG’s own rules described human oversight and source-checking. The failure was that it was a guideline, not a gate. A guideline gets skipped under deadline pressure. A gate cannot. Every factual claim and every citation in AI-assisted work must be confirmed against its real source by a named human before anything publishes. No exceptions.

2. Use AI for prose, humans for facts. AI is genuinely good at drafting, structuring, and polishing. It is dangerous for generating the facts themselves — the names, the case studies, the citations — because that is exactly where hallucination lives. Let AI write the sentences around facts a human has verified. Never let it invent the facts.

3. Run a detection pass before release. The irony is that GPTZero — an AI-detection tool — caught this after publication. The same tool could have run before. If you publish at scale, build an automated citation-check into your pipeline. Catching it internally costs an afternoon. Catching it publicly costs your credibility.

The VulpisLab Verdict

🔍 VulpisLab Verdict

Severity: Critical. A report on AI excellence became a live demo of AI’s worst failure, under a name people trust. And it is the third Big Four firm to do it in under a year — Deloitte refunded the Australian government in 2025, EY pulled a report in May 2026, now KPMG. The pattern is the story.

The one lesson: AI does not remove the need for verification — it multiplies it. The faster you can generate authoritative content, the more disciplined your checking has to be, not less. Before your team ships anything an AI helped write, ask the question KPMG’s process skipped: has a human confirmed that every fact in here is actually true? If the answer is no, it is not ready.

The checklist before publishing any AI-assisted content

Has a human verified every factual claim against its real source?
Has every citation been clicked and confirmed to exist?
Did AI invent any names, examples, or facts that nobody checked?
Is verification a required gate or just a guideline people can skip?

Sources

Primary:
GPTZero original investigation: gptzero.me/news/investigations-kpmg
The Register (5 of 45; “vibe citing”; 55% vs 71%): theregister.com
Engadget (28 paraphrased / 12 vague; Emirates “Sara” example): engadget.com
TechCrunch (KPMG pulls report; EY precedent): techcrunch.com

Supporting:
Digital Today (institution denials; Edward Tian “pollute the soil” quote): digitaltoday.co.kr
Let’s Data Science (CityAM: 40 fabricated titles; “no human double-checked” quote): letsdatascience.com
Startup Fortune (verification-process analysis): startupfortune.com
Fortune (Deloitte $290K refund precedent, Oct 2025): fortune.com

VulpisLab — AI product teardowns. No hype. No vendor copy. Just teardown and verdict. Read Issue #01: The Hallucination Tax · Issue #04: ChatGPT Memory.