Benchmarks
Superficial-audited models achieve an average 99.5% one-shot factual accuracy on Google DeepMind's FACTS Benchmark.
Loading benchmark data...
Model outputs are scored using Google DeepMind's FACTS benchmark. When a model's response is marked "inaccurate" by FACTS, we one-shot enhance it using Superficial's audit results and re-score it with FACTS to independently measure Superficial's accuracy gains.