AI Benchmarks Under Scrutiny: Hacks, IQ Scores, and OpenAI’s
A new automated tool called BenchJack found 219 reward-hacking exploits across 10 major AI agent benchmarks, achieving near-perfect scores without completing any…
A new automated tool called BenchJack found 219 reward-hacking exploits across 10 major AI agent benchmarks, achieving near-perfect scores without completing any…
Meta began cutting roughly 8,000 employees the week of May 20, 2026, while General Motors separately laid off 600 IT workers to…
Poppy, a new AI assistant app from former Humane engineer Sai Kambampati, launched this week combining calendar, email, messages, and location data…
Microsoft CEO Satya Nadella testified in the Musk v. Altman trial during the week of May 11, 2026, revealing that the company…
Anduril raised $5 billion at a $61 billion valuation on May 13, 2026, while OpenAI launched a new enterprise deployment unit and…
Thinking Machines Lab previewed real-time multimodal interaction models built by Mira Murati's team, while Perceptron launched a video reasoning model priced 80–90%…
Recursive Language Models are outperforming conventional agentic architectures on long-context benchmarks by passing context between reasoning steps by reference rather than replicating…
Google entered the week before I/O 2026 with three major announcements: the Googlebook laptop built around Gemini Intelligence, a second Gemini Startup…
In 2026, Recursive Language Models are topping long-context benchmarks with a shared-context architecture, a contested AI IQ scoring site is forcing debate…
Active exploitation of a critical NGINX heap overflow (CVE-2026-42945, CVSS 9.2) began days after F5 patched it, while Microsoft's Patch Tuesday included…
Researchers at UIUC, Stanford, and Nous Research published methods in May 2026 that cut AI training and inference costs by 2.4x to…
Five concurrent developments this week mark a turning point for enterprise AI agents: Anthropic reversed its ban on third-party agent access via…