AGI Milestone: New Training Methods Cut Reasoning Model Costs 90%

Researchers at JD.com and academic institutions have developed a breakthrough training technique that reduces the computational cost of building custom reasoning AI models by up to 90%, addressing a critical barrier to AGI development. The method, called Reinforcement Learning with Verifiable Rewards with Self-Distillation (RLSD), combines reinforcement learning with granular feedback mechanisms to create more efficient reasoning capabilities.

Meanwhile, multiple AI labs are pushing toward AGI milestones through different approaches. American startup Poolside launched its open-source Laguna XS.2 model optimized for autonomous coding, while xAI released Grok 4.3 at aggressive pricing to compete with frontier models from OpenAI and Anthropic.

Breakthrough in Reasoning Model Training

The RLSD technique addresses the “signal density problem” that has plagued reasoning model development. Traditional Reinforcement Learning with Verifiable Rewards (RLVR) provides only binary feedback — a model either gets the answer right or wrong, with no guidance on which intermediate steps led to success or failure.

“A multi-thousand-token reasoning trace gets a single binary reward, and every token inside that trace receives identical credit, whether it’s a pivotal logical step or a throwaway phrase,” Chenxu Yang, co-author of the research, told VentureBeat.

RLSD solves this by combining the reliable performance tracking of reinforcement learning with the granular feedback of self-distillation. This allows models to learn which specific reasoning steps contribute to correct answers, dramatically improving training efficiency.

Key advantages of RLSD:

90% reduction in computational requirements compared to traditional methods
Superior performance over classic distillation and reinforcement learning algorithms
Lower barriers for enterprise teams to build custom reasoning models
Granular feedback on intermediate reasoning steps

Poolside Challenges Big Tech with Open AGI Models

San Francisco-based Poolside made waves by launching two new Laguna large language models designed for agentic workflows — AI systems that can write code, use third-party tools, and take autonomous actions beyond simple chat generation.

https://x.com/eisokant/status/2049142230397370537

The company also released “pool,” a coding agent harness, and “shimmer,” a web-based, mobile-optimized agentic coding development environment. This represents a significant challenge to established players by offering high-performance capabilities through open-source licensing at dramatically lower costs.

Poolside’s approach contrasts sharply with the recent pattern of expensive proprietary models from major labs. While Anthropic released Claude Opus 4.7 and OpenAI countered with GPT-5.5, both at premium pricing, Poolside joins Chinese companies like DeepSeek and Xiaomi in pursuing near-frontier performance with open licensing.

xAI Escalates Competition with Aggressive Pricing

Elon Musk’s xAI launched Grok 4.3 alongside a new voice cloning suite, pricing the model at $1.25 per million input tokens and $2.50 per million output tokens — significantly undercutting competitors. The release comes as Musk faces off against OpenAI co-founder Sam Altman in court.

According to Artificial Analysis, Grok 4.3 marks a significant performance leap over its predecessor Grok 4.2, though it remains below state-of-the-art models from OpenAI and Anthropic. The aggressive pricing strategy appears designed to compete on value rather than pure performance.

Bindu Reddy, CEO of enterprise assistant startup Abacus AI, noted on X that Grok 4.3 is “as smart as GPT-4o but costs 10x less,” highlighting the pricing advantage that could accelerate enterprise adoption.

Fundamental Research Questions AGI Assumptions

A new study from researchers challenges a core assumption in AGI development: that compositional reasoning emerges automatically from successful symbol grounding. The research introduces the Iterative Logic Tensor Network (iLTN), demonstrating that reasoning requires explicit training objectives rather than emerging as a byproduct.

The study found that models trained solely on grounding objectives — learning to connect symbols with real-world concepts — failed to generalize to new reasoning tasks. Only models trained jointly on both grounding and multi-step reasoning achieved high zero-shot accuracy across novel entities, unseen relations, and complex rule compositions.

This finding has significant implications for AGI development strategies, suggesting that reasoning capabilities must be explicitly designed and trained rather than assumed to emerge from other AI capabilities.

What This Means

These developments represent a critical inflection point in AGI research, with multiple breakthrough approaches converging simultaneously. The RLSD training method democratizes access to reasoning model development by slashing computational requirements, while open-source initiatives like Poolside’s Laguna models challenge the dominance of proprietary frontier models.

The aggressive pricing competition from xAI and others suggests the market is shifting toward accessibility and cost-effectiveness rather than pure performance leadership. This could accelerate enterprise adoption of AGI-adjacent technologies by making advanced reasoning capabilities economically viable for smaller organizations.

Most significantly, the research questioning the relationship between grounding and reasoning provides crucial guidance for AGI development strategies. Rather than assuming reasoning will emerge naturally, labs may need to explicitly design training regimens that target compositional generalization — potentially explaining why current frontier models still struggle with complex multi-step reasoning tasks.

FAQ

What makes RLSD different from previous AI training methods?
RLSD combines reinforcement learning’s reliable performance tracking with self-distillation’s granular feedback, allowing models to learn which specific reasoning steps lead to correct answers. This reduces computational requirements by up to 90% compared to traditional methods.

How does Poolside’s approach differ from OpenAI and Anthropic?
Poolside focuses on open-source models optimized for specific tasks like autonomous coding, rather than general-purpose proprietary models. Their Laguna models offer near-frontier performance at dramatically lower costs through open licensing.

Why is the grounding vs. reasoning research important for AGI?
It challenges the assumption that reasoning emerges automatically from symbol grounding, suggesting AGI developers must explicitly train for compositional reasoning rather than expecting it to develop naturally from other capabilities.