AI Benchmarks Under Fire: Hacking, IQ Scores, and Parameter
A new automated auditing tool called BenchJack found 219 reward-hacking exploits across 10 popular AI agent benchmarks. Separately, OpenAI closed its Parameter…
A new automated auditing tool called BenchJack found 219 reward-hacking exploits across 10 popular AI agent benchmarks. Separately, OpenAI closed its Parameter…
HiddenLayer researchers disclosed a tokenizer vulnerability in Hugging Face that lets attackers hijack locally-run open-source models by modifying a single .json file,…
From Raindrop AI's open source agent debugger Workshop to Andrej Karpathy's declaration that vibe coding is giving way to spec-driven agentic engineering,…
Researchers at UIUC and Stanford have developed RecursiveMAS, a multi-agent framework that routes inter-agent communication through embedding space instead of text, achieving…
The EU plans to regulate addictive design features on TikTok and Instagram before the end of 2026, while a new book warns…
Thinking Machines Lab previewed real-time interaction models for continuous audio and video input, while Perceptron released a video reasoning model priced 80–90%…
Perceptron Inc. launched its Mk1 video analysis model at $0.15/$1.50 per million tokens — 80–90% below Anthropic, OpenAI, and Google's comparable pricing,…
Tech companies including Meta and Block are cutting thousands of jobs while explicitly citing AI efficiency gains, even as National Economic Council…
Five significant vulnerabilities disclosed this week — including a zero-click Outlook RCE, an actively exploited Linux privilege escalation chain, and a 16-year-old…
Isomorphic Labs raised $2.1 billion led by Thrive Capital on May 13, 2026 — the second-largest biotech fundraise ever — to advance…
Tesla disclosed that remote human operators caused two of its Austin robotaxi crashes between July 2025 and March 2026, according to newly…
Anduril raised $5 billion at a $61 billion valuation, NVIDIA crossed $40 billion in equity investments, and OpenAI launched a new enterprise…