Superficial

xAI Grok 4 Superfacts Benchmark results are in.

Highlights:

Grok 4 like its predecessor, Grok 3, comes in middle of the pack on the Superfacts benchmark at 9th out of 15 models tracked.
At a claim level, Grok 4 hallucinations 12.04% of the time, above all OpenAI models however still trailing SOTA from Google and Anthropic.
Grok 4 has achieved performance gains on Grok 3 with only a small increase in hallucination rates - 12.04% vs 11.05%.
Grok 4 appears, like Grok 3, to respond particularly well to Superficial one-shot audits with 100% enhanced accuracy.

About Superfacts

Superfacts is the first claim-level hallucination benchmark for top AI models. Visit the benchmark at benchmarks.superficial.org.

About Grok 4

Grok 4 is xAI's latest flagship model, offering strong performance in natural language, math and reasoning. Learn more at docs.x.ai/docs/models/grok-4-0709.

Grok 4 Benchmarks