Superficial

OpenAI GPT-5 Superfacts Claim-Level Hallucination Benchmark results are in.

OpenAI has put a strong focus on accuracy in GPT-5 with its new Safe Completions system — and the results show.

Highlights:

All GPT-5 models (GPT-5, GPT-5 mini, GPT-5 nano) deliver significant accuracy gains over GPT-4 and o3. GPT-4 models averaged a 23.86% claim-level hallucination rate, o3 models 16.86%. GPT-5 cuts this to 8.06% on average — rising from the bottom of our accuracy leaderboard while improving capability.
Individually: GPT-5 hallucinates 6.27% of the time, GPT-5 mini 8.45%, and GPT-5 nano 9.46%.
This places GPT-5 models alongside Google and Anthropic at the top of the Superfacts Leaderboard: GPT-5 ranks second behind Gemini 2.5 Flash, GPT-5 mini fifth, and GPT-5 nano eighth of 18 models tracked.
Like other OpenAI models, GPT-5 responds especially well to Superficial’s one-shot audits, reaching 100% enhanced factual accuracy.

About Superfacts

Superfacts is the first claim-level hallucination benchmark for top AI models. Visit: benchmarks.superficial.org

About the GPT-5 model family

GPT-5 is OpenAI’s flagship model for coding, reasoning, and agentic tasks across domains. Learn more: platform.openai.com/docs/models/gpt-5

GPT-5 Benchmarks