AI-driven drug discovery took several concrete steps forward this month, with SandboxAQ integrating physics-based molecular models into Anthropic’s Claude, a Phase 2 clinical trial producing statistically notable results for a synaptic regeneration drug, and a new benchmark exposing gaps in how AI agents handle complex, multi-source research tasks.
SandboxAQ Brings Physics-Grounded Models to Claude
SandboxAQ, an Alphabet spinout founded roughly five years ago with more than $950 million raised from investors, has partnered with Anthropic to embed its large quantitative models (LQMs) directly into Claude. According to TechCrunch, the integration puts drug discovery and materials science tools behind a conversational interface — no specialized computing infrastructure required.
The distinction between LQMs and conventional large language models matters here. SandboxAQ’s models are described as “physics-grounded” — built on the rules of the physical world rather than statistical patterns in text. They can run quantum chemistry calculations and simulate molecular dynamics and microkinetics, the study of how chemical reactions unfold at the molecular level. That means researchers can get predictions about how candidate molecules will behave before any lab work begins.
The company’s stated market target is what it calls the “quantitative economy” — a sector it values at more than $50 trillion, spanning biopharma, financial services, energy, and advanced materials. SandboxAQ chairman Eric Schmidt, Google’s former CEO, is among the company’s most prominent backers.
The practical argument is straightforward: the bottleneck in AI-assisted drug discovery isn’t the quality of the models, it’s who can access them. Most existing tools still require researchers with substantial computational expertise. Wrapping LQMs inside Claude’s conversational interface lowers that barrier without changing the underlying scientific capability.
Tazbentetol Phase 2 Trial Shows Sustained Effect in Schizophrenia
Separately, Spinogenix reported early Phase 2 results for tazbentetol, a first-in-class investigational synaptic regeneration therapy, at the Schizophrenia International Research Society 2026 Annual Congress. According to Spinogenix’s press release, the drug produced a placebo-adjusted 6.3-point reduction in the Positive and Negative Syndrome Scale (PANSS) score in an add-on clinical trial.
The PANSS is the standard instrument for measuring schizophrenia symptom severity. A 6.3-point placebo-adjusted reduction in an add-on setting — where patients are already on existing antipsychotic medication — is considered clinically meaningful, since the baseline for improvement is already partially achieved by the primary drug.
What makes the results particularly notable is the durability of the effect. Patients who stopped taking tazbentetol after six weeks of use continued to show maintained efficacy for many days after discontinuation. That pattern is consistent with the drug’s proposed mechanism: tazbentetol is believed to modulate fascin-1/F-actin dynamics, promoting synaptic regeneration rather than simply managing symptoms while the drug is present.
The drug promotes formation of dendritic spines with glutamatergic synapses. Spinogenix is also investigating tazbentetol for Alzheimer’s disease, amyotrophic lateral sclerosis, glaucoma, and diabetic retinopathy — conditions that share neurodegeneration as a common thread. These trials are ongoing and no Phase 3 data exists yet.
PolitNuggets Benchmark Tests AI Agents on Long-Tail Facts
On the evaluation side, researchers published a new benchmark on arXiv this month targeting a specific and underexamined capability: how well AI agents discover and synthesize obscure, dispersed factual information. The paper, arXiv:2605.14002, introduces PolitNuggets — a multilingual benchmark built around constructing political biographies for 400 global elites, covering more than 10,000 political facts.
The benchmark is designed to stress-test large reasoning models (LRMs) embedded in agentic frameworks — systems that don’t just answer questions from a fixed context window, but actively search, retrieve, and synthesize information from multiple sources. The authors call this capability “agentic information synthesis” and argue it remains significantly under-evaluated relative to how often it’s used in real-world deployments.
To standardize scoring, the researchers built FactNet, an evidence-conditional evaluation protocol that measures three distinct dimensions:
- Discovery — whether the agent finds the relevant fact at all
- Fine-grained accuracy — whether the details retrieved are precisely correct
- Efficiency — how much compute and retrieval effort the agent expends
The findings are instructive. Current systems frequently struggle with fine-grained details even when they locate the right general information. Performance varies substantially across models, particularly in multilingual settings. The authors tie agent performance back to underlying model capabilities — specifically flagging short-context extraction, multilingual robustness, and reliable tool use as the three variables most predictive of benchmark success.
PolitNuggets is a narrow benchmark by design, but the capability it probes — finding and accurately synthesizing long-tail facts from dispersed sources — is exactly what’s required in domains like drug discovery, legal research, and competitive intelligence.
Cerebras IPO and the Research Infrastructure Bet
The broader context for all of this research activity is infrastructure. The Cerebras Systems IPO on Thursday generated billions for investors including Benchmark, which holds 9.5% of the company, according to TechCrunch. Benchmark general partner Eric Vishria has been a Cerebras board member since the company’s founding in 2016, having co-led its $25 million Series A.
Cerebras builds large-format AI chips designed specifically for deep learning workloads — a bet that GPUs, originally built for graphics rendering, are a suboptimal substrate for AI training. That thesis, which Vishria said crystallized for him on the third slide of Cerebras’ original pitch deck, is now worth billions.
The connection to drug discovery research is direct: the compute models that SandboxAQ, Isomorphic Labs, and others are deploying for molecular simulation are exactly the kinds of workloads that specialized AI chips are built to run faster and more efficiently. As clinical AI moves from proof-of-concept to production, the hardware layer becomes a meaningful constraint.
What This Means
Three developments in the same week point to the same structural shift: AI in drug discovery is moving from research tools used by specialists to infrastructure used by organizations.
SandboxAQ’s Claude integration is the clearest signal. Physics-grounded molecular simulation is genuinely hard — but the company’s decision to surface it through a conversational interface suggests the models are mature enough that the remaining barrier is access, not capability. That’s a different problem than the one most AI drug discovery startups have been solving.
The tazbentetol results are a reminder that AI-adjacent drug development is still drug development. A 6.3-point PANSS reduction in a Phase 2 add-on trial is promising, but Phase 3 data and regulatory approval remain years away. The sustained post-discontinuation effect is scientifically interesting and, if replicated, would distinguish tazbentetol from symptom-management drugs — but replication is the operative word.
PolitNuggets, meanwhile, is a useful corrective to benchmarking inflation. The finding that current agentic systems struggle with fine-grained accuracy — even when they find the right general information — maps directly onto the risks of deploying these systems in high-stakes research contexts. A drug discovery agent that retrieves the right molecule but gets the binding affinity wrong is worse than no agent at all.
Taken together, the week’s research news reflects an industry that is genuinely making progress on hard problems while accumulating a clearer picture of where the remaining gaps are.
FAQ
What are SandboxAQ’s large quantitative models?
What are SandboxAQ’s large quantitative models?
Large quantitative models (LQMs) are AI models built on physical laws and scientific equations rather than patterns in text. SandboxAQ’s LQMs can run quantum chemistry calculations and simulate molecular dynamics, allowing researchers to predict how candidate drug molecules will behave before lab testing begins.
What does a 6.3-point PANSS reduction mean in clinical terms?
The Positive and Negative Syndrome Scale (PANSS) measures schizophrenia symptom severity across 30 items. A placebo-adjusted reduction of 6.3 points in an add-on trial — where patients are already on a primary antipsychotic — is considered clinically meaningful because the baseline for improvement is already partially achieved by the existing medication.
What is the PolitNuggets benchmark testing?
PolitNuggets evaluates how accurately AI agents can discover and synthesize obscure factual information from dispersed, multilingual sources — a capability called agentic information synthesis. The benchmark covers more than 10,000 political facts across 400 global figures, scoring agents on discovery, fine-grained accuracy, and retrieval efficiency.
Sources
- The biggest AI breakthrough in medicine & drug discovery – Reddit Singularity
- (Breakthrough) Tazbentetol significantly improved symptoms in patients with schizophrenia in a Phase 2 add-on clinical trial, with efficacy sustained for many days after drug discontinuation. – Reddit Singularity
- PolitNuggets: Benchmarking Agentic Discovery of Long-Tail Political Facts – arXiv AI
- Cerebras IPO makes billions for Benchmark but VC Eric Vishria almost didn’t take the meeting – TechCrunch
- SandboxAQ brings its drug discovery models to Claude — no PhD in computing required – TechCrunch






