AI Drug Discovery: SandboxAQ, Tazbentetol, and New

SandboxAQ on Tuesday announced an integration with Anthropic’s Claude that puts its physics-grounded drug discovery models behind a conversational interface, the same week that schizophrenia drug tazbentetol posted a 6.3-point placebo-adjusted PANSS reduction in a Phase 2 add-on trial — two data points that together illustrate how quickly AI-assisted medicine is moving from research abstraction to clinical signal.

SandboxAQ Brings LQMs Into Claude

Founded roughly five years ago as an Alphabet spinout, SandboxAQ has raised more than $950 million from investors and counts Eric Schmidt, Google’s former CEO, as its chairman. According to TechCrunch, the company has now integrated its large quantitative models (LQMs) directly into Claude, Anthropic’s conversational AI assistant.

LQMs differ from conventional language models in a fundamental way: they are built on the rules of physics rather than patterns in text. The models can run quantum chemistry calculations and simulate molecular dynamics and microkinetics — the study of how chemical reactions unfold at the molecular level. That means researchers can query how a candidate molecule is likely to behave before any lab work begins.

The practical implication, as SandboxAQ frames it, is an interface problem solved. Most AI drug discovery tools still require technically sophisticated users to operate. By routing LQM capabilities through Claude’s conversational layer, SandboxAQ is targeting researchers who understand the science but lack specialized computing infrastructure. The company described LQMs as engineered for a “quantitative economy” it values at more than $50 trillion, spanning biopharma, financial services, energy, and advanced materials.

Competitors including Chai Discovery and Isomorphic Labs have concentrated on model architecture. SandboxAQ’s bet is that accessibility — not raw model performance — is the remaining bottleneck.

Tazbentetol Posts Durable Phase 2 Signal in Schizophrenia

Separately, Spinogenix reported Phase 2 add-on trial results for tazbentetol at the Schizophrenia International Research Society 2026 Annual Congress. According to Spinogenix’s press release, the drug produced a placebo-adjusted reduction of 6.3 points on the Positive and Negative Syndrome Scale (PANSS) — a standardized measure of schizophrenia symptom severity.

The PANSS result is notable in context. Add-on trials, where the investigational drug is layered on top of existing antipsychotic treatment, typically show smaller effect sizes than monotherapy studies because patients are already partially medicated. A 6.3-point placebo-adjusted reduction in that setting is considered a meaningful clinical signal by psychiatry researchers, though independent peer review of the full dataset has not yet been published.

Equally significant is the durability finding: patients who discontinued tazbentetol after six weeks of use retained efficacy for many days afterward. Spinogenix attributed this to the drug’s proposed mechanism — modulation of fascin-1/F-actin dynamics to promote synaptic regeneration, specifically the formation of dendritic spines with glutamatergic synapses. Tazbentetol is classified as a first-in-class investigational synaptic regenerative therapy. Beyond schizophrenia, Spinogenix is also studying the compound in Alzheimer’s disease, ALS, glaucoma, and diabetic retinopathy.

PolitNuggets Benchmarks Agentic Fact Discovery

On the evaluation side, researchers published a new benchmark called PolitNuggets on arXiv (arXiv:2605.14002), targeting a capability gap that has received limited formal attention: the ability of AI agents to discover and synthesize “long-tail” political facts from dispersed, multilingual sources.

The benchmark covers 400 global political elites and more than 10,000 political facts, constructed by building political biographies that require agents to locate obscure, non-prominent information rather than retrieve well-indexed answers. The researchers introduced FactNet, an evidence-conditional evaluation protocol that scores three dimensions separately:

Discovery — whether the agent found the relevant fact at all
Fine-grained accuracy — whether the retrieved fact is precisely correct
Efficiency — how many steps or tokens the agent consumed to get there

Results across tested models showed that current systems frequently struggle with fine-grained detail retrieval and vary substantially in efficiency. The authors linked agent performance to three underlying model capabilities: short-context extraction quality, multilingual robustness, and reliable tool use. The benchmark is multilingual by design, reflecting the reality that political information about non-English-speaking elites is often only available in local-language sources.

PolitNuggets fills a specific gap. Most existing agentic benchmarks test retrieval of prominent facts with clear, indexed answers. Long-tail discovery — finding the fifth job a politician held in 1998, for instance — is a harder and more realistic task for intelligence research, journalism, and policy analysis applications.

Cerebras IPO Validates AI Hardware Investment Thesis

The week’s financial news reinforced the commercial stakes of AI infrastructure research. The Cerebras Systems IPO delivered substantial returns for early investors, with Benchmark — which owns 9.5% of the company — among the largest beneficiaries. According to TechCrunch, Benchmark general partner Eric Vishria co-led Cerebras’ $25 million Series A in 2016 and has served on its board since.

Vishria told TechCrunch he nearly skipped the initial meeting — Benchmark had not backed a hardware company in a decade. His skepticism reversed by the third slide, when Cerebras co-founder and CEO Andrew Feldman argued that GPUs were never designed for AI workloads and simply happened to outperform CPUs by a factor of roughly 100. Cerebras built its business around wafer-scale chips designed specifically for AI training, a thesis that predated Google’s 2017 Transformer paper — the research that eventually underpinned ChatGPT and the current generation of large language models.

The IPO outcome is a data point for the broader AI hardware research ecosystem: purpose-built silicon, not repurposed graphics processors, is increasingly where institutional capital is flowing.

What This Means

Three threads run through this week’s AI research and application news, and they point in the same direction.

First, the interface layer is becoming a competitive moat. SandboxAQ’s Claude integration is not primarily a model story — the LQMs existed before this announcement. It is a distribution story. The company is betting that making powerful scientific AI accessible to non-computing specialists will matter more than marginal model improvements. That framing, if correct, has implications well beyond drug discovery.

Second, AI-adjacent drug development is producing early but real clinical signals. Tazbentetol’s mechanism — synaptic regeneration rather than receptor blockade — is a departure from how most psychiatric drugs work, and the durability of effect after discontinuation is unusual enough to warrant close attention when full peer-reviewed data becomes available. The compound is not an AI model, but its development pipeline has been shaped by AI-assisted target identification and molecular simulation of the kind SandboxAQ and its peers provide.

Third, evaluation infrastructure is catching up to capability claims. PolitNuggets is a narrow benchmark, but its design philosophy — measuring long-tail, multilingual, multi-step discovery rather than headline retrieval — reflects a maturing understanding of what agentic AI actually needs to do in real deployments. Better benchmarks make it harder to overstate model performance, which benefits the field’s credibility over time.

Taken together, the week’s developments suggest that AI in medicine and science is moving past proof-of-concept and into the harder, messier work of deployment, evaluation, and clinical validation.

FAQ

What are SandboxAQ’s large quantitative models (LQMs)?

LQMs are proprietary AI models built on physical laws and scientific equations rather than text patterns. They can run quantum chemistry calculations and simulate molecular dynamics, allowing researchers to predict how drug candidates will behave before lab testing begins.

What does a 6.3-point PANSS reduction mean for tazbentetol?

The PANSS (Positive and Negative Syndrome Scale) is a standard clinical measure of schizophrenia symptom severity. A placebo-adjusted reduction of 6.3 points in an add-on trial — where patients are already on existing medication — is considered a meaningful clinical signal, though the full peer-reviewed dataset has not yet been published.

What is the PolitNuggets benchmark designed to test?

PolitNuggets evaluates whether AI agents can discover obscure, “long-tail” political facts from dispersed, multilingual sources — a harder task than retrieving well-indexed answers. It covers 400 global political elites and more than 10,000 facts, scored on discovery, fine-grained accuracy, and efficiency.

Sources

The biggest AI breakthrough in medicine & drug discovery – Reddit Singularity
(Breakthrough) Tazbentetol significantly improved symptoms in patients with schizophrenia in a Phase 2 add-on clinical trial, with efficacy sustained for many days after drug discontinuation. – Reddit Singularity
PolitNuggets: Benchmarking Agentic Discovery of Long-Tail Political Facts – arXiv AI
Cerebras IPO makes billions for Benchmark but VC Eric Vishria almost didn’t take the meeting – TechCrunch
SandboxAQ brings its drug discovery models to Claude — no PhD in computing required – TechCrunch