Three AI startups this week unveiled new models targeting specific gaps in the current AI landscape, offering alternatives to established players like OpenAI and Anthropic. Perceptron Inc. released its video analysis model Mk1 at 80-90% lower cost than competitors, while Thinking Machines previewed “interaction models” for real-time conversation, and Zyphra open-sourced ZAYA1-8B, a reasoning model trained entirely on AMD hardware.
The releases signal growing competition in specialized AI capabilities beyond general-purpose chatbots. Each startup targets distinct use cases: enterprise video analysis, natural human-AI interaction, and efficient reasoning at scale.
Perceptron Mk1 Cuts Video Analysis Costs by 90%
Perceptron Inc., a two-year-old startup led by former Meta FAIR researcher Armen Aghajanyan, launched its flagship video analysis model Mk1 with pricing at $0.15 per million input tokens and $1.50 per million output tokens. According to VentureBeat, this represents 80-90% cost savings compared to Anthropic’s Claude Sonnet 4.5, OpenAI’s GPT-5, and Google’s Gemini 3.1 Pro for similar capabilities.
The model targets enterprise video analysis applications including security monitoring, marketing content optimization, and behavioral analysis in controlled studies. Perceptron spent 16 months developing what it calls a “multi-modal recipe” designed to understand physical world dynamics, cause-and-effect relationships, and object interactions.
The company offers a public demo for testing the model’s capabilities. Performance benchmarks focus on spatial and video understanding tasks, though specific accuracy metrics weren’t disclosed in the announcement.
Thinking Machines Previews Real-Time AI Interaction
Thinking Machines, the startup founded by former OpenAI CTO Mira Murati and co-founder John Schulman, announced a research preview of “interaction models” designed to move beyond traditional turn-based AI conversations. The models process and respond to human inputs in near real-time across voice and video modalities.
According to the company’s blog post, these systems treat interactivity as “a first-class citizen of model architecture rather than an external software harness.” This architectural approach reportedly reduces latency compared to conventional models that process inputs sequentially.
The interaction models remain in limited research preview, with broader availability planned for coming months. Thinking Machines has not disclosed pricing or specific performance benchmarks for the technology.
Technical Architecture Innovation
Unlike current AI systems that wait for complete user input before generating responses, interaction models can process and respond to ongoing human communication simultaneously. This capability could enable more natural conversation flows in applications requiring real-time human-AI collaboration.
The startup positions this technology as essential for AI systems handling jobs requiring natural interaction, moving beyond the current “turn-based” paradigm dominant across text, audio, and video AI applications.
Zyphra Open-Sources ZAYA1-8B Reasoning Model
Palo Alto startup Zyphra released ZAYA1-8B, an 8-billion parameter mixture-of-experts reasoning model with only 760 million active parameters. The model is available on Hugging Face under Apache 2.0 license and achieves competitive performance against GPT-5-High and DeepSeek-V3.2 on third-party benchmarks.
The model’s distinguishing feature is its training infrastructure: ZAYA1-8B was trained entirely on AMD Instinct MI300 GPUs, demonstrating viability of non-NVIDIA hardware for AI model development. This represents a significant validation of AMD’s AI accelerator platform, which has struggled to gain adoption against NVIDIA’s dominant position.
Users can test ZAYA1-8B through Zyphra Cloud or download it directly for enterprise customization. The company emphasizes “intelligence density” through what it describes as “full-stack innovation” spanning architecture and training methodologies.
Sakana’s RL Conductor Orchestrates Multiple AI Models
Sakana AI introduced RL Conductor, a 7-billion parameter model trained via reinforcement learning to automatically coordinate multiple large language models including GPT-5, Claude Sonnet 4, and Gemini 2.5 Pro. The system dynamically analyzes inputs and distributes tasks among worker models to optimize performance and cost.
Yujin Tang, co-author of the research, told VentureBeat that manual frameworks like LangChain “fall short because they are inherently rigid and constrained.” RL Conductor addresses this by learning optimal coordination strategies rather than relying on hardcoded workflows.
The orchestration model achieves state-of-the-art results on reasoning and coding benchmarks while reducing API calls and costs compared to individual frontier models or manual multi-agent pipelines. Sakana has commercialized this technology through its Fugu multi-agent orchestration service.
Solving Multi-Agent Coordination Challenges
Traditional agentic frameworks break down when query distributions shift in production environments with heterogeneous user demands. RL Conductor learns to adapt coordination strategies automatically, eliminating the need for manual pipeline redesign when use cases evolve.
The approach recognizes that no single model excels across all tasks, instead optimizing the selection and coordination of specialized models for specific query types. This meta-learning approach could reshape how enterprises deploy AI systems at scale.
What This Means
These releases highlight three key trends reshaping the AI model landscape. First, specialized models targeting specific use cases are emerging as viable alternatives to general-purpose systems, with Perceptron’s video focus and Thinking Machines’ interaction emphasis leading this shift.
Second, cost optimization is becoming a primary differentiator, with Perceptron’s 90% cost reduction and Zyphra’s efficient 8B parameter design challenging the assumption that larger models necessarily deliver better value. Third, infrastructure diversification is accelerating, with Zyphra’s AMD-trained model proving that NVIDIA alternatives can produce competitive results.
The orchestration approach pioneered by Sakana suggests the future may involve coordinated model ecosystems rather than monolithic systems. This could reduce enterprise dependence on single AI providers while optimizing performance across diverse workloads.
FAQ
How do these new models compare in pricing to established AI providers?
Perceptron Mk1 costs $0.15-$1.50 per million tokens, representing 80-90% savings versus Claude, GPT-5, and Gemini for video analysis tasks. Zyphra’s ZAYA1-8B is free under Apache 2.0 license, while Thinking Machines hasn’t disclosed pricing for its interaction models.
What makes these models technically different from existing AI systems?
Perceptron focuses specifically on video understanding with physics-aware training, Thinking Machines processes real-time interaction rather than turn-based responses, Zyphra achieves competitive performance with only 760M active parameters, and Sakana’s RL Conductor learns to coordinate multiple models automatically.
Are these models available for enterprise use now?
Zyphra’s ZAYA1-8B is immediately available for download and customization. Perceptron Mk1 is accessible via API and demo. Thinking Machines’ interaction models remain in limited research preview with broader availability planned for coming months. Sakana’s technology is commercialized through its Fugu service.
Related news
Sources
- Perceptron Mk1 shocks with highly performant video analysis AI model 80-90% cheaper than Anthropic, OpenAI & Google – VentureBeat
- Thinking Machines shows off preview of near-realtime AI voice and video conversation with new ‘interaction models’ – VentureBeat
- How Sakana trained a 7B model to orchestrate GPT, Claude and Gemini LLMs – VentureBeat
- How Sakana trained a 7B model to orchestrate GPT-5, Claude Sonnet 4 and Gemini 2.5 Pro – VentureBeat






