Sakana AI Trains 7B Model to Orchestrate GPT-5 and Claude

Sakana AI on Tuesday released research demonstrating how a 7-billion parameter model can automatically coordinate multiple frontier LLMs, outperforming individual models like GPT-5 and Claude Sonnet 4 on reasoning and coding benchmarks. The “RL Conductor” system uses reinforcement learning to dynamically distribute tasks among worker models, achieving state-of-the-art results while reducing API costs by orders of magnitude.

According to the research paper published on arXiv, the system automatically analyzes inputs and coordinates among agents without requiring manual pipeline design. The approach addresses a core limitation of existing multi-agent frameworks that break when query distributions shift in production environments.

Breaking the Manual Pipeline Bottleneck

Current agentic AI systems rely heavily on hardcoded workflows like LangChain pipelines, which fail when deployed across diverse user bases with heterogeneous demands. Yujin Tang, co-author of the research, told VentureBeat that “an inherent bottleneck arises when targeting domains with large user bases with very heterogeneous demands.”

The RL Conductor eliminates this constraint by learning optimal coordination patterns through reinforcement learning rather than manual design. Tang noted that “real-world generalization in such heterogeneous applications inherently necessitates going beyond human-hardcoded designs.”

The system addresses another fundamental limitation: no single model excels across all task types. Rather than forcing users to choose between different frontier models, RL Conductor automatically selects and coordinates the most appropriate models for each subtask.

Technical Architecture and Performance

The RL Conductor operates as a small 7B parameter model that serves as the orchestration layer above larger worker models including GPT-5, Claude Sonnet 4, and Gemini 2.5 Pro. The system analyzes incoming queries, breaks them into subtasks, and dynamically assigns work to the most suitable models.

Performance results show the coordinated system outperforming individual frontier models on difficult reasoning and coding benchmarks. The approach achieves these gains while making fewer API calls than traditional multi-agent pipelines, resulting in significant cost reductions.

Sakana AI has commercialized the technology through Fugu, a multi-agent orchestration service that serves as the practical implementation of the RL Conductor research.

Industry Context and Competing Approaches

While Sakana focused on multi-model orchestration, other recent research has targeted different efficiency challenges. Miami-based startup Subquadratic emerged from stealth this week claiming a 1,000x efficiency gain through subquadratic architecture, though the AI research community has demanded independent verification of the extraordinary claims.

The company raised $29 million in seed funding at a $500 million valuation, according to The New Stack, with investors including Tinder co-founder Justin Mateen and former SoftBank Vision Fund partner Javier Villamizar.

Meanwhile, IBM Research introduced MAMMAL, a multi-modal model combining proteins, molecules, and gene data that achieves state-of-the-art results on 9 of 11 biological benchmarks. The model outperforms AlphaFold 3 on certain tasks including antibody-antigen binding prediction, according to research published in Nature.

Commercial Implications and Deployment

The RL Conductor approach represents a shift from static pipeline design to dynamic model coordination. Rather than organizations building separate integrations with multiple AI providers, a single orchestration layer can automatically optimize model selection and task distribution.

This automation could reduce the engineering overhead of multi-model deployments while improving performance consistency across diverse query types. The cost efficiency gains become particularly significant for organizations processing high volumes of varied requests.

Sakana’s Fugu service provides immediate commercial access to the technology, positioning the company to capture value from organizations seeking to deploy multiple frontier models without manual coordination overhead.

What This Means

The RL Conductor research demonstrates that smaller models can effectively coordinate larger ones, potentially reshaping how organizations deploy AI systems. Rather than choosing between different frontier models, the orchestration approach allows organizations to leverage the strengths of multiple models simultaneously.

The automatic coordination capability addresses a practical deployment challenge that has limited multi-model adoption. As frontier models continue to specialize in different capabilities, orchestration systems like RL Conductor may become essential infrastructure for production AI applications.

The commercial success of Sakana’s Fugu service will provide early validation of market demand for automated model coordination versus traditional single-model or manually-coordinated approaches.

FAQ

How does RL Conductor compare to existing multi-agent frameworks?
RL Conductor uses reinforcement learning to automatically coordinate models, while frameworks like LangChain require manual pipeline design. The automated approach adapts to changing query distributions without breaking, addressing a key limitation of hardcoded systems.

What models can RL Conductor orchestrate?
The system coordinates multiple frontier models including GPT-5, Claude Sonnet 4, and Gemini 2.5 Pro. The 7B orchestration model analyzes inputs and dynamically assigns subtasks to the most appropriate worker models.

Is the technology commercially available?
Yes, Sakana AI has commercialized the research through Fugu, a multi-agent orchestration service. Organizations can access the automated coordination capabilities without building their own orchestration infrastructure.