Two AI startups unveiled video and multimodal reasoning models this week, targeting capabilities that major labs have left largely underdeveloped. Perceptron Inc. launched its Mk1 video analysis model at $0.15 per million input tokens / $1.50 per million output tokens — pricing the company says is 80–90% below comparable offerings from Anthropic, OpenAI, and Google. Separately, Thinking Machines, the startup founded by former OpenAI CTO Mira Murati, released a research preview of what it calls “interaction models,” designed for near-real-time voice and video conversation.
Perceptron Mk1: Video Reasoning at a Fraction of Rival Costs
Perceptron Inc., a two-year-old startup, announced Mk1 on May 14, 2026 — its flagship proprietary video analysis and reasoning model built specifically to understand the physical world. The model is available via API and can be tested on a public demo site.
According to VentureBeat’s coverage, the company spent 16 months developing what it describes as a “multi-modal recipe” from the ground up. CEO Armen Aghajanyan, formerly of Meta FAIR and Microsoft, led the effort.
The pricing structure puts Mk1 well below its named competitors:
- Perceptron Mk1: $0.15 / $1.50 per million tokens (input/output)
- Anthropic Claude Sonnet 4.5: approximately 80–90% higher, per Perceptron’s claim
- OpenAI GPT-5: approximately 80–90% higher, per Perceptron’s claim
- Google Gemini 3.1 Pro: approximately 80–90% higher, per Perceptron’s claim
Those cost comparisons come from Perceptron directly and have not been independently verified by DMN. Enterprises evaluating the model should benchmark against their specific use cases before accepting the percentage gap at face value.
What Mk1 Is Built to Do
The model targets video analysis tasks that have remained niche or expensive with existing tools. According to VentureBeat, Perceptron positions Mk1 for several enterprise applications:
- Security monitoring: acting as a watchdog over physical sites and facilities via live video feeds
- Content production: automatically identifying and clipping high-value segments from marketing videos for social media repurposing
- Quality control: flagging inconsistencies, errors, and continuity problems in video content before distribution
- Behavioral analysis: detecting body language and physical actions in controlled research settings or candidate assessment scenarios
The model’s architecture is designed around what Perceptron calls “grounded understanding” — the ability to reason about cause-and-effect relationships, object dynamics, and physical constraints visible in video. Performance is backed by results on spatial and video benchmarks, though specific benchmark scores were not fully detailed in the available source material.
The company’s framing is that video AI has historically required either expensive proprietary models or significant engineering effort to deploy. Mk1 is positioned as an API-accessible alternative that smaller teams can integrate without those barriers.
Thinking Machines Previews Interaction Models
On the same day, Thinking Machines — the startup co-founded by former OpenAI CTO Mira Murati and former OpenAI researcher and co-founder John Schulman — announced a research preview of what it calls “interaction models.”
The core idea, according to the company’s announcement blog post, is to treat interactivity as a first-class design constraint in model architecture rather than a feature bolted on through software. Most current AI models operate on a turn-based structure: a user submits an input, the model processes it, then returns an output. Thinking Machines claims its interaction models can respond fluidly while simultaneously processing the next incoming input — whether that input is text, audio, or video.
The company reported gains on third-party benchmarks and reduced latency compared to conventional architectures, though it did not publish specific numbers in the materials reviewed by DMN.
Critically, the models are not yet publicly available. Thinking Machines said in its announcement that a limited research preview will open “in the coming months” to collect feedback, with a wider release to follow. Enterprises and developers cannot access the system today.
The Broader Context: Video AI as an Emerging Battleground
Both announcements land at a moment when the major AI labs — OpenAI, Google DeepMind, Anthropic, and Meta — have focused the bulk of their public model releases on text and image capabilities. Video understanding, particularly over live or long-form feeds, has remained a harder problem with fewer polished commercial solutions.
Perceptron’s pricing strategy is a direct challenge to that gap. By undercutting the major proprietary APIs by a claimed 80–90%, the company is betting that cost has been a primary barrier to enterprise adoption of video AI — not just capability.
Thinking Machines is approaching the same general territory from a different angle. Rather than competing on price for a specific modality, the company is arguing that the interaction model itself needs to change. If AI is expected to handle jobs that involve natural, real-time human interaction — customer service, coaching, live analysis — then the turn-based architecture that defines today’s tools is a structural limitation, not just a latency issue.
Neither company is a household name yet. Perceptron is two years old; Thinking Machines launched in 2025. But both have credible technical leadership and are addressing real gaps that larger players have been slow to fill.
What This Means
Perceptron’s Mk1 is the more immediately actionable of the two releases. It is live, API-accessible, and priced to attract enterprises that have been priced out of video AI from the major labs. The 80–90% cost reduction claim is striking, and if benchmarks hold up under independent testing, Mk1 could shift procurement conversations for companies building video-based monitoring, content, or analysis pipelines.
Thinking Machines’ interaction model preview is more speculative at this stage — there is no public access, and the company’s benchmark claims lack the granularity needed for direct comparison. But the architectural argument is worth watching. If near-real-time multimodal interaction can be achieved natively at the model level rather than through latency-reduction engineering, it would change what AI-powered interfaces can realistically do in live, high-stakes environments.
Taken together, these two releases suggest that the next competitive pressure on OpenAI, Google, and Anthropic may not come from a single rival model — it may come from a cluster of focused startups each attacking a specific capability or cost point where the incumbents have left room.
FAQ
What is Perceptron Mk1 and how does its pricing compare to GPT-5 and Claude?
Perceptron Mk1 is a video analysis and reasoning model released via API at $0.15 per million input tokens and $1.50 per million output tokens. According to Perceptron, this is 80–90% cheaper than comparable tiers from OpenAI’s GPT-5, Anthropic’s Claude Sonnet 4.5, and Google’s Gemini 3.1 Pro, though those comparisons come from the company itself and have not been independently verified.
What are Thinking Machines’ interaction models and when will they be available?
Interaction models are a class of native multimodal systems from Thinking Machines that treat real-time interactivity as a core architectural feature rather than a software add-on, enabling near-simultaneous processing of voice and video inputs. The company announced a research preview on May 14, 2026, but said public access would open “in the coming months” — no firm date has been set.
Who founded Thinking Machines and what is the company’s background?
Thinking Machines was founded in 2025 by Mira Murati, former CTO of OpenAI, and John Schulman, a former OpenAI researcher and co-founder. The company is well-funded and focused on multimodal AI systems and human-AI collaboration, according to VentureBeat.
Sources
- Perceptron Mk1 shocks with highly performant video analysis AI model 80-90% cheaper than Anthropic, OpenAI & Google – VentureBeat
- Thinking Machines shows off preview of near-realtime AI voice and video conversation with new ‘interaction models’ – VentureBeat
- Apple Releases iOS 26.5: New Update Adds Long-Awaited Feature For iPhone – Forbes Tech
- Live updates from Elon Musk and Sam Altman’s court battle over the future of OpenAI – The Verge
- Grand Theft Auto 6 Release Date And Everything Confirmed – Forbes Tech






