Two AI startups unveiled distinct model releases this week: Perceptron Inc. launched Mk1, a proprietary video analysis model priced at $0.15 per million input tokens and $1.50 per million output tokens — roughly 80–90% below comparable offerings from Anthropic, OpenAI, and Google. Separately, Thinking Machines, the startup founded by former OpenAI CTO Mira Murati, released a research preview of what it calls “interaction models,” a new class of multimodal systems designed for near-real-time voice and video conversation.
Perceptron Mk1: Video Reasoning at a Fraction of Rival Costs
Perceptron Inc., a two-year-old startup, announced Mk1 on May 14, 2026 — its first publicly available video analysis reasoning model. The company’s co-founder and CEO, Armen Aghajanyan, previously worked at Meta FAIR and Microsoft before spending 16 months building what Perceptron describes as a “multi-modal recipe” designed from the ground up for physical-world understanding.
The model’s API pricing undercuts three major rivals directly:
- Anthropic Claude Sonnet 4.5
- OpenAI GPT-5
- Google Gemini 3.1 Pro
All three carry per-token costs that Perceptron claims are 80–90% higher than Mk1’s rates, according to VentureBeat’s coverage of the launch.
The target use cases are broad. According to VentureBeat, Mk1 is positioned for enterprise security monitoring, automated highlight clipping from marketing footage, video quality assurance, and behavioral analysis in research or hiring contexts. Perceptron has published a public demo for prospective customers to evaluate the model directly.
Mk1’s benchmark performance centers on spatial and grounded video understanding — areas where the company says existing general-purpose models fall short. Perceptron’s stated design goal was to give the model fluency in cause-and-effect reasoning, object dynamics, and basic physics — capabilities that text-dominant architectures have historically struggled to generalize.
Thinking Machines Previews “Interaction Models” for Real-Time Multimodal AI
Thinking Machines, founded last year by Mira Murati and former OpenAI researcher and co-founder John Schulman, released a research preview of interaction models on the same day — framing them as a structural departure from the input-wait-output cycle that defines current AI interfaces.
In its announcement blog post, the company described interaction models as “native multimodal systems that treat interactivity as a first-class citizen of model architecture” — meaning the real-time response capability is baked into the model itself rather than bolted on through software wrappers. The company reported benchmark gains and reduced latency compared to existing approaches, though specific numbers were not disclosed in the available sourcing.
The models are not yet publicly available. Thinking Machines said it will open a limited research preview in the coming months to gather feedback before a wider release.
What “Interaction Models” Actually Mean
The core claim is architectural: rather than processing a complete human input, generating a full response, and then awaiting the next turn, interaction models are designed to respond while simultaneously processing incoming inputs. This would allow AI systems to handle overlapping speech, interruptions, and mid-sentence corrections — behaviors common in natural human conversation but poorly handled by current systems.
Thinking Machines is positioning this as necessary infrastructure for AI to take on roles that require genuine conversational fluency, not just fast text generation.
Pricing Context: Where Mk1 Sits in the Market
The cost differential Perceptron is advertising deserves scrutiny. At $0.15 / $1.50 per million tokens (input/output), Mk1 is priced below most general-purpose multimodal models currently on the market. For comparison, VentureBeat noted the gap against Claude Sonnet 4.5, GPT-5, and Gemini 3.1 Pro — all of which carry substantially higher per-token rates for multimodal inference.
Video understanding is compute-intensive. Models processing live video feeds must handle high frame rates, temporal context across sequences, and spatial reasoning simultaneously — workloads that typically push inference costs up relative to text-only tasks. Perceptron’s ability to undercut rivals by 80–90% on price, if the benchmark performance holds under real-world conditions, would represent a meaningful cost advantage for enterprises running continuous video monitoring at scale.
The company has not published detailed information about its model architecture or training data, so independent verification of its benchmark claims remains limited at this stage.
The Broader Push Toward Multimodal and Real-Time AI
Both launches reflect a direction the industry has been moving toward for several years: AI systems that go beyond text and handle video, audio, and real-time interaction natively. OpenAI introduced real-time audio capabilities in late 2024. Google has expanded Gemini’s video understanding across its product line. Meta has pushed multimodal capabilities into its open-source Llama models.
What distinguishes this week’s announcements is the focus on architecture and cost rather than raw capability claims. Perceptron is competing on price for a specific vertical — video analysis — rather than positioning Mk1 as a general-purpose model. Thinking Machines is competing on interaction design, arguing that the current turn-based paradigm is a structural limitation, not just a latency problem.
Neither company has the distribution of OpenAI, Google, or Anthropic, but both are targeting gaps those companies have left open: affordable video inference and genuinely fluid real-time conversation.
What This Means
Perceptron’s pricing, if it holds up under enterprise workloads, puts pressure on larger providers to justify their video inference costs. The 80–90% discount claim is aggressive, and the model is new enough that independent stress-testing is limited — but the public demo lowers the barrier for enterprises to evaluate it directly. If Mk1 performs at benchmark levels on real customer data, it could accelerate adoption of video AI in mid-market companies that have found existing API costs prohibitive.
Thinking Machines’ preview is harder to evaluate because the models aren’t available yet. The architectural argument — that real-time interactivity needs to be native, not layered on — is technically coherent, but the company will need to demonstrate it at scale. Murati’s background at OpenAI gives the project credibility, and the limited preview strategy suggests the team is being deliberate about not over-promising before the technology is ready.
Taken together, both releases suggest the next wave of AI model competition will be fought on specificity: models built for particular modalities, interaction types, or cost profiles, rather than general-purpose systems trying to do everything.
FAQ
What is Perceptron Mk1 and how much does it cost?
Mk1 is a video analysis reasoning model from Perceptron Inc., released in May 2026. It is priced at $0.15 per million input tokens and $1.50 per million output tokens via API — approximately 80–90% below comparable multimodal models from Anthropic, OpenAI, and Google, according to VentureBeat.
What are Thinking Machines’ interaction models?
Interaction models are a class of multimodal AI systems announced by Thinking Machines in a research preview. Unlike standard AI interfaces that process one input at a time, these models are designed to respond in near-real-time while simultaneously processing the next human input — treating interactivity as a core architectural feature rather than an add-on. They are not yet publicly available.
Who founded Thinking Machines and what is the company’s background?
Thinking Machines was founded by Mira Murati, former chief technology officer at OpenAI, and John Schulman, a former OpenAI researcher and co-founder. The company, which launched in 2025, focuses on multimodal AI and human-AI collaboration, and has received significant funding since its founding.
Sources
- Perceptron Mk1 shocks with highly performant video analysis AI model 80-90% cheaper than Anthropic, OpenAI & Google – VentureBeat
- Thinking Machines shows off preview of near-realtime AI voice and video conversation with new ‘interaction models’ – VentureBeat
- Apple Releases iOS 26.5: New Update Adds Long-Awaited Feature For iPhone – Forbes Tech
- Live updates from Elon Musk and Sam Altman’s court battle over the future of OpenAI – The Verge
- Grand Theft Auto 6 Release Date And Everything Confirmed – Forbes Tech






