vision-language models | Digital Mind News

Enterprise

Alibaba Cloud's HappyHorse 1.1 video model reached No. 2 globally in June 2026 as OpenAI's Sora…

2026-06-24

Google

Google unveiled Gemini Omni at I/O 2026 as its first native any-to-any multimodal model, while ByteDance…

2026-06-23

AI

ByteDance Research released Lance, a 3B-parameter open-source model handling image and video generation and editing in…

2026-05-19

OpenAI

Thinking Machines Lab previewed real-time multimodal interaction models, Perceptron released a video analysis model priced 80–90%…

2026-05-16

Enterprise

Research shows major AI models are converging on similar internal representations of reality despite different training…

2026-05-12

Enterprise

Multimodal AI achieved major advances in May 2026, with Thinking Machines unveiling real-time interaction models for…

2026-05-12

AI Agents

NVIDIA launched Nemotron 3 Nano Omni, a unified multimodal AI model that processes video, audio, images,…

2026-05-09

Enterprise

NVIDIA launched Nemotron 3 Nano Omni, an open multimodal model that unifies vision, audio, and text…

2026-05-06

AI Agents

NVIDIA launched Nemotron 3 Nano Omni, an open multimodal model that unifies vision, audio and text…

2026-05-06

Enterprise

NVIDIA launched Nemotron 3 Nano Omni, delivering 9x efficiency gains by unifying vision, audio, and language…

2026-05-05

AI Agents

OpenAI launched ChatGPT Images 2.0 with advanced text-in-image generation capabilities, while Google countered with Deep Research…

2026-05-05

Enterprise

NVIDIA released Nemotron 3 Nano Omni, an open multimodal AI model that unifies vision, audio, and…

2026-05-04