Sakana AI on Tuesday released research on its “RL Conductor,” a 7-billion parameter language model trained via reinforcement learning to automatically orchestrate multiple frontier LLMs including GPT-5, Claude Sonnet 4, and Gemini 2.5 Pro. According to the paper published on arXiv, the system achieves state-of-the-art results on reasoning and coding benchmarks while reducing API costs and calls compared to manual multi-agent frameworks.
The Conductor model dynamically analyzes inputs, distributes tasks among worker LLMs, and coordinates responses without human-designed pipelines. Sakana AI has integrated this technology into Fugu, its commercial multi-agent orchestration service currently in development.
Breaking Beyond Hard-Coded AI Pipelines
Current multi-agent frameworks like LangChain rely on manually designed workflows that break when query patterns shift in production environments. Yujin Tang, co-author of the research, told VentureBeat that “an inherent bottleneck arises when targeting domains with large user bases with very heterogeneous demands.”
The RL Conductor addresses this limitation by learning optimal orchestration patterns through reinforcement learning rather than relying on predetermined rules. The system can adapt to new query types and optimize resource allocation across different LLMs based on their individual strengths.
Key technical advantages include:
- Dynamic task routing based on input analysis
- Automatic load balancing across multiple AI models
- Cost optimization through strategic API call reduction
- Real-time adaptation to changing query distributions
Performance Benchmarks and Cost Efficiency
The RL Conductor outperformed individual frontier models and expensive human-designed pipelines across multiple evaluation metrics. The system achieved superior results on complex reasoning tasks while maintaining significantly lower operational costs.
Sakana AI’s approach leverages the complementary strengths of different LLMs. GPT-5 excels at certain reasoning patterns, Claude Sonnet 4 performs better on others, and Gemini 2.5 Pro has distinct advantages in specific domains. The Conductor learns to route queries to the most appropriate model or combination of models.
The 7B parameter size keeps the orchestration overhead minimal while maintaining sophisticated decision-making capabilities. This compact architecture enables real-time coordination without the computational burden of larger orchestration models.
Multi-Modal Breakthroughs in Biological AI
Separately, IBM Research introduced MAMMAL, a multi-modal model combining protein, molecular, and gene data that achieves state-of-the-art performance on 9 out of 11 biological benchmarks. According to research published in Nature, MAMMAL outperforms AlphaFold 3 on several antibody-antigen binding tasks.
MAMMAL’s benchmark victories include:
- Drug-target interaction prediction
- Ligand binding affinity prediction
- Antibody-antigen binding (significant win vs AlphaFold 3)
- Gene expression prediction
- Multi-modal biological reasoning
- Molecular property prediction
- Functional prediction
- Cell-level response modeling
- Cross-domain generalization
While AlphaFold 3 and MAMMAL serve different primary purposes, MAMMAL’s multi-modal approach enables more comprehensive biological reasoning by integrating diverse data types rather than focusing solely on protein structure prediction.
Controversial Claims from Subquadratic Startup
Miami-based startup Subquadratic emerged from stealth Tuesday claiming its SubQ 1M-Preview model achieves 1,000x efficiency gains through fully subquadratic architecture. According to VentureBeat, the company raised $29 million in seed funding at a $500 million valuation from investors including Tinder co-founder Justin Mateen.
The company claims its architecture reduces attention compute by almost 1,000 times compared to frontier models at 12 million tokens of context. If validated, this would represent a fundamental breakthrough in how AI systems scale with context length.
However, the AI research community has responded with significant skepticism. Critics questioned why the company restricts access through early-access programs if the model truly costs less than 5% of comparable systems to operate.
What This Means
These developments signal three distinct trends in AI research. Sakana’s RL Conductor represents the evolution from rigid, manually-designed AI workflows toward adaptive, self-optimizing systems. This approach could reduce the engineering overhead required to maintain production AI applications as user needs evolve.
IBM’s MAMMAL demonstrates the power of multi-modal approaches in specialized domains like drug discovery. By combining multiple biological data types, these systems can achieve superior performance compared to single-modality models, even established ones like AlphaFold 3.
Subquadratic’s claims, while unverified, highlight the ongoing race to solve fundamental scaling constraints in large language models. The extreme skepticism from researchers underscores the importance of independent validation for extraordinary efficiency claims.
FAQ
How does Sakana’s RL Conductor differ from existing multi-agent frameworks?
Unlike frameworks like LangChain that use hard-coded rules, RL Conductor learns optimal orchestration patterns through reinforcement learning. This enables automatic adaptation to new query types without manual reprogramming.
Why is MAMMAL’s performance against AlphaFold 3 significant?
AlphaFold 3 is considered the gold standard for protein structure prediction. MAMMAL’s superior performance on antibody-antigen binding tasks demonstrates that multi-modal approaches can exceed specialized single-purpose models in specific domains.
Should Subquadratic’s efficiency claims be taken seriously?
The claims require independent verification. While the mathematical concept of subquadratic scaling is sound, the reported 1,000x efficiency improvement is extraordinary and has drawn skepticism from AI researchers who question the limited access and benchmark selection.






