Superficial

OpenAI o3 Pro Superfacts Benchmark results are in.

Highlights:

o3 Pro like other OpenAI models is particularly hallucination prone, coming in 9th out of 12 models tracked and last place against other SOTA reasoning models.
At a claim level, o3 Pro hallucinations 18% of the time, compared to 8.5% for Gemini 2.5 Pro and 9.5% for Claude Opus 4.
o3 appears to respond particularly well to Superficial one-shot audits with 99.02% enhanced accuracy, in line with other OpenAI models all at ~100% factual accuracy post Superficial enhancement.
o3 Pro appears to be roughly similar to regular o3 in accuracy (82.08% vs 84.14%).

About Superfacts

Superfacts is the first claim-level hallucination benchmark for top AI models. Visit the benchmark at benchmarks.superficial.org.

About o3 Pro

o3 Pro is an advanced multi-modal model from OpenAI, designed for complex tasks requiring deep reasoning and reliable outputs, particularly in areas like coding, math, and science. Learn more at platform.openai.com/docs/models/o3-pro.

o3 Pro Benchmarks