AI Reasoning Models Advance with New Training Methods and Open Source

AI reasoning capabilities took major steps forward this week with the launch of xAI’s Grok 4.3 at $1.25 per million input tokens and new training frameworks that reduce compute requirements by up to 60%. DeepSeek-V4 also arrived as an open-source model delivering near state-of-the-art performance at one-sixth the cost of competing systems.

Grok 4.3 Brings Always-On Reasoning at Aggressive Pricing

xAI shipped Grok 4.3 on Monday with integrated chain-of-thought reasoning and pricing that undercuts major competitors by 50-70%. According to VentureBeat, the model costs $1.25 per million input tokens and $2.50 per million output tokens, compared to OpenAI’s GPT-4 at $5 per million input tokens.

The model features “always-on reasoning” architecture that automatically applies chain-of-thought processing without explicit prompting. Artificial Analysis benchmarks show Grok 4.3 delivers a significant performance jump over its predecessor Grok 4.2, particularly in legal reasoning tasks where the dense logical structures align with the model’s reasoning capabilities.

However, independent evaluators note inconsistencies in general reasoning performance. Vals AI reported a “stark gap” between domain-specific strengths and general reasoning consistency, suggesting the model excels in structured domains like law and finance but struggles with broader problem-solving tasks.

New Training Framework Cuts Compute Requirements by 60%

Researchers at JD.com introduced Reinforcement Learning with Verifiable Rewards with Self-Distillation (RLSD), a training paradigm that addresses the sparse feedback problem in reasoning model development. According to their arxiv paper, traditional reinforcement learning provides only binary rewards for multi-thousand-token reasoning traces, giving identical credit to pivotal logical steps and throwaway phrases.

RLSD combines reinforcement learning’s performance tracking with self-distillation’s granular feedback. “Standard GRPO has a signal density problem,” co-author Chenxu Yang told VentureBeat. “A multi-thousand-token reasoning trace gets a single binary reward, and every token inside that trace receives identical credit.”

Experimental results show RLSD-trained models outperform both classic distillation and traditional reinforcement learning approaches while requiring 60% fewer tokens for training. This reduction in computational requirements makes custom reasoning model development accessible to enterprise teams without massive infrastructure investments.

DeepSeek-V4 Delivers Open Source Reasoning at Scale

DeepSeek released V4, a 1.6-trillion-parameter Mixture-of-Experts model under the MIT License, marking what researchers call the “second DeepSeek moment.” The model approaches state-of-the-art performance while costing approximately one-sixth the price of proprietary alternatives like Claude Opus 4.7 and GPT-5.5.

DeepSeek AI researcher Deli Chen described the release as a “labor of love” developed over 484 days since V3’s launch. The model is available through Hugging Face and DeepSeek’s API.

The release demonstrates continued progress in democratizing advanced AI reasoning capabilities through open-source development, following DeepSeek’s January 2025 breakthrough with the R1 model that initially challenged proprietary U.S. systems.

Breakthrough in Automated Logical Reasoning

Researchers developed an automated reasoning framework that achieved a 98.03% solving rate on Intelligence Quotient problems without prior knowledge of the problems. According to their arxiv research, this performance corresponds to the top 1% percentile or 132-144 IQ score.

The system integrates rigid logical reasoning with neural networks through object-relational analysis. The researchers note that results are “only limited by the small size of the model and the processing capabilities of the machine it run on,” suggesting potential for further improvements with larger models and better hardware.

This approach represents a shift from pure scaling toward hybrid architectures that combine machine learning with formal reasoning methods.

Decentralized AI Auditing Framework Emerges

Researchers introduced TRUST (Transparent, Robust, and Unified Services for Trustworthy AI), a decentralized framework for auditing Large Reasoning Models and Multi-Agent Systems. The system addresses four key limitations of centralized approaches: robustness, scalability, opacity, and privacy.

TRUST uses Hierarchical Directed Acyclic Graphs (HDAGs) to decompose chain-of-thought reasoning into five abstraction levels for parallel distributed auditing. According to the research, the framework achieved 72.4% accuracy across multiple benchmarks, representing a 4-18% improvement over baseline methods.

The system includes a multi-tier consensus mechanism among computational checkers, LLM evaluators, and human experts with stake-weighted voting that guarantees correctness under 30% adversarial participation. All decisions are recorded on-chain while privacy-by-design segmentation prevents reconstruction of proprietary logic.

What This Means

These developments signal a maturation in AI reasoning capabilities across three critical dimensions: cost reduction, training efficiency, and verification systems. The aggressive pricing from xAI and open-source availability from DeepSeek democratize access to advanced reasoning models, while new training frameworks like RLSD make custom model development feasible for smaller organizations.

The emergence of automated logical reasoning systems achieving human-level performance on IQ tests, combined with decentralized auditing frameworks, suggests the field is moving beyond pure scaling toward more sophisticated architectures that integrate formal reasoning with neural networks.

For enterprises, these advances lower both technical and financial barriers to implementing reasoning-capable AI systems while providing new frameworks for ensuring reliability and accountability in high-stakes applications.

FAQ

How does Grok 4.3’s pricing compare to other leading AI models?
Grok 4.3 costs $1.25 per million input tokens, roughly 50-70% less than GPT-4’s $5 per million tokens and significantly below Claude’s pricing. This aggressive pricing strategy positions xAI as a cost-effective alternative for developers building reasoning-intensive applications.

What makes RLSD training more efficient than traditional methods?
RLSD addresses the sparse feedback problem in reinforcement learning by combining binary rewards with granular self-distillation feedback. This allows models to learn from intermediate reasoning steps rather than just final outcomes, reducing training tokens by 60% while improving performance over traditional approaches.

Is DeepSeek-V4 truly competitive with closed-source models?
Benchmarks indicate DeepSeek-V4 approaches and sometimes surpasses state-of-the-art closed-source systems while costing one-sixth the price. However, like other models, it shows domain-specific strengths and weaknesses, excelling in structured reasoning tasks while potentially struggling with broader general intelligence applications.