Perceptron Mk1 and Thinking Machines Debut New Video AI - featured image
OpenAI

Perceptron Mk1 and Thinking Machines Debut New Video AI

Photo by Pavel Danilyuk on Pexels

Synthesized from 5 sources

Two AI startups unveiled video and multimodal models this week, targeting capabilities that major labs have left largely underdeveloped. Perceptron Inc. released its Mk1 video reasoning model at $0.15 per million input tokens and $1.50 per million output tokens — pricing the company says is 80–90% below comparable offerings from Anthropic, OpenAI, and Google. Separately, Thinking Machines, the startup founded by former OpenAI CTO Mira Murati, announced a research preview of what it calls “interaction models,” a new class of multimodal systems designed for near-real-time voice and video conversation.

Perceptron Mk1: Video Reasoning at a Fraction of Rival Costs

Perceptron Inc., a two-year-old startup, announced Mk1 on Wednesday as its flagship proprietary video analysis reasoning model. The company’s co-founder and CEO, Armen Aghajanyan — formerly of Meta FAIR and Microsoft — led a 16-month development effort to build what Perceptron describes as a “multi-modal recipe” designed from the ground up for physical-world understanding.

The pricing contrast with established rivals is stark. According to VentureBeat, Mk1’s API rates undercut Anthropic’s Claude Sonnet 4.5, OpenAI’s GPT-5, and Google’s Gemini 3.1 Pro by 80–90% on a per-token basis. Perceptron has not disclosed the exact pricing of those competing models in its announcement, but the differential is positioned as a deliberate go-to-market wedge for enterprise customers who need video analysis at scale.

The model is available now via API, and Perceptron has opened a public demo site for prospective users to test it directly.

What Mk1 Is Built to Do

Mk1 targets use cases that require understanding of cause-and-effect, object dynamics, and physical-world context — capabilities that text-first models handle poorly. According to VentureBeat’s coverage, practical applications include:

  • Security monitoring — automated watchdog analysis of live facility feeds
  • Marketing content — identifying and clipping high-engagement moments from long-form video
  • Quality control — flagging inconsistencies, errors, or gaffes before publication
  • Behavioral analysis — reading body language and actions in research or hiring contexts

Perceptron says Mk1’s performance is backed by industry-standard benchmarks focused on grounded spatial understanding, though the company has not yet released a full public benchmark comparison table at the time of writing.

A Crowded but Immature Market

Video understanding remains one of the less mature segments of the multimodal AI market. While GPT-4o, Gemini, and Claude all accept video inputs to varying degrees, none has made video reasoning a primary product focus. Perceptron’s bet is that dedicated architecture — built specifically for video — will outperform general-purpose models on this modality while costing significantly less to run.

Aghajanyan’s background at Meta FAIR, where multimodal research has been a long-standing priority, likely informed the architectural choices behind Mk1. The startup has not disclosed funding figures or investor names publicly.

Thinking Machines Previews “Interaction Models” for Real-Time Dialogue

Thinking Machines, announced last year by Mira Murati and former OpenAI researcher and co-founder John Schulman, released a research preview of what it calls “interaction models” — a new class of native multimodal systems designed to treat interactivity as a first-class architectural property rather than a feature bolted onto existing model infrastructure.

According to the company’s announcement blog post, current AI interaction follows a strict turn-based pattern: the user submits an input, the model processes it, and then responds. Thinking Machines argues this structure is a fundamental limitation for applications requiring natural, fluid conversation — including voice agents, video call assistants, and real-time collaboration tools.

Architecture Over Software Harness

The core claim in Thinking Machines’ preview is that existing “real-time” AI voice and video products achieve their interactivity through external software layers — essentially duct-taped onto models that were not designed for it. Interaction models, by contrast, embed interactivity into the model architecture itself, allowing the system to respond while simultaneously processing the next incoming input.

The company reported benchmark gains and reduced latency compared to harness-based approaches, though it has not published a detailed benchmark breakdown in its preview announcement. Specific latency figures and benchmark names were not disclosed in available sources.

Availability Timeline

The models are not yet publicly available. In its blog post, Thinking Machines said it will open a limited research preview in the coming months to collect feedback, with a wider release to follow. No pricing has been announced. The preview is aimed at researchers and select partners rather than general enterprise or consumer deployment.

The announcement positions Thinking Machines as working on a longer development arc — building toward a fundamentally different interaction paradigm rather than shipping an incremental product update.

Pricing and Competitive Positioning

The two announcements arrive at a moment when enterprise AI buyers are increasingly cost-sensitive. Inference pricing across the major labs has dropped substantially over the past 18 months as competition intensified, but video-specific models have remained expensive relative to text.

Perceptron’s $0.15 / $1.50 per million token structure is designed to make video analysis economically viable at production scale — a threshold that many enterprises have found difficult to clear with current pricing from the major providers. If the benchmark performance holds up under independent evaluation, the price gap alone could drive significant enterprise interest.

Thinking Machines is not yet competing on price because it has not yet competed on availability. Its strategy appears to be differentiation through architecture first, with commercialization to follow once the research preview validates the approach.

What This Means

Both announcements reflect the same underlying dynamic: the frontier of AI capability is no longer exclusively a big-lab story. Perceptron and Thinking Machines are both founded by people with deep experience at the major labs — Meta FAIR, Microsoft, OpenAI — who have left to pursue specific bets that larger organizations have been slower to make.

For Perceptron, the bet is that video reasoning is a large enough market to sustain a dedicated model company, and that price is the primary barrier to enterprise adoption. The 80–90% cost reduction claim is the kind of specific, falsifiable number that will either hold up or collapse under enterprise evaluation — there is no ambiguity about what Perceptron is promising.

For Thinking Machines, the bet is architectural: that the turn-based interaction model is a structural limitation, not just a UX problem, and that solving it at the model level will produce meaningfully better products than the software-layer workarounds currently in use. Murati’s credibility from her time as OpenAI CTO gives the company a degree of technical trust that most two-year-old startups would not have, but the research preview stage means real-world validation is still months away.

Together, the two releases suggest that video and real-time multimodal interaction are the next major areas of model-level competition — and that startups, not just the major labs, intend to define what those capabilities look like.

FAQ

What is Perceptron Mk1?

Mk1 is a proprietary video analysis reasoning model from Perceptron Inc., released via API at $0.15 per million input tokens and $1.50 per million output tokens. It is designed for use cases including security monitoring, marketing content analysis, and behavioral research, and is available now through Perceptron’s API and public demo site.

How does Thinking Machines’ interaction model differ from existing AI voice products?

Thinking Machines claims its interaction models embed real-time interactivity into the model architecture itself, rather than using external software layers to simulate responsiveness. The company says this allows the model to respond while processing the next incoming input simultaneously, reducing latency and producing more natural conversation. The models are not yet publicly available.

When will Thinking Machines’ interaction models be available to the public?

Thinking Machines said in its announcement blog post that it will open a limited research preview to collect feedback in the coming months, followed by a wider release on an unspecified timeline. No pricing has been announced, and the current preview is not open to general users or enterprises.

Sources

Digital Mind News

Digital Mind News is an AI-operated newsroom. Every article here is synthesized from multiple trusted external sources by our automated pipeline, then checked before publication. We disclose our AI authorship openly because transparency is part of the product.