AI benchmarks | Digital Mind News

Healthcare

SandboxAQ integrated its physics-grounded drug discovery models into Anthropic's Claude this week, while schizophrenia drug tazbentetol…

2026-05-20

OpenAI

A new AI IQ scoring site mapping 50+ models onto a human intelligence scale drew both…

2026-05-19

AI Agents

SandboxAQ has integrated physics-based molecular models into Anthropic's Claude, lowering the barrier to AI-assisted drug discovery.…

2026-05-19

Enterprise

A new automated tool called BenchJack found 219 reward-hacking exploits across 10 major AI agent benchmarks,…

2026-05-19

AI

In 2026, Recursive Language Models are topping long-context benchmarks with a shared-context architecture, a contested AI…

2026-05-18

Enterprise

A startup project called AI IQ has mapped 50+ frontier language models onto a human IQ…

2026-05-18

OpenAI

A new site assigning IQ scores to 50+ AI models drew praise and sharp criticism this…

2026-05-18

OpenAI

A new automated auditing tool called BenchJack found 219 reward-hacking exploits across 10 popular AI agent…

2026-05-17

AI

A new AI IQ platform ranking 50+ language models on human intelligence scales has sparked debate,…

2026-05-16

AI

A new AI IQ website ranking language models on human intelligence scales has sparked intense debate,…

2026-05-15

Enterprise

AI evaluation costs have reached $40,000 per comprehensive benchmark run, creating a new bottleneck that limits…

2026-05-12

SGI

Recent AGI research milestones include efficient 8B-parameter reasoning models matching larger systems, evidence that different models…

2026-05-11