xAI Grok 4 Superfacts Benchmark results are in.
Highlights:
- Grok 4 like its predecessor, Grok 3, comes in middle of the pack on the Superfacts benchmark at 9th out of 15 models tracked.
- At a claim level, Grok 4 hallucinations 12.04% of the time, above all OpenAI models however still trailing SOTA from Google and Anthropic.
- Grok 4 has achieved performance gains on Grok 3 with only a small increase in hallucination rates - 12.04% vs 11.05%.
- Grok 4 appears, like Grok 3, to respond particularly well to Superficial one-shot audits with 100% enhanced accuracy.
About Superfacts
Superfacts is the first claim-level hallucination benchmark for top AI models. Visit the benchmark at benchmarks.superficial.org.
About Grok 4
Grok 4 is xAI's latest flagship model, offering strong performance in natural language, math and reasoning. Learn more at docs.x.ai/docs/models/grok-4-0709.