Benchmarks

Superficial-audited models achieve an average 99.5% one-shot factual accuracy on Google DeepMind's FACTS Benchmark.

View our Live Hallucination Benchmark

Loading benchmark data...

Model outputs are scored using Google DeepMind's FACTS benchmark. When a model's response is marked "inaccurate" by FACTS, we one-shot enhance it using Superficial's audit results and re-score it with FACTS to independently measure Superficial's accuracy gains.