Enterprise
AI Benchmarks Under Scrutiny: Hacks, IQ Scores, and OpenAI’s
A new automated tool called BenchJack found 219 reward-hacking exploits across 10 major AI agent benchmarks,…
A new automated tool called BenchJack found 219 reward-hacking exploits across 10 major AI agent benchmarks,…