ChatGPT Images 2.0 Launches with Text Generation as ComfyUI Hits $500M

OpenAI on Tuesday launched ChatGPT Images 2.0, a new image generation model that can produce multiple images from single prompts and generate accurate text in non-English languages including Chinese and Hindi. The release comes as ComfyUI, a node-based workflow tool for AI image generation, raised $30 million at a $500 million valuation, signaling strong investor confidence in specialized AI creative tools.

According to Wired, ChatGPT Images 2.0 integrates with ChatGPT’s reasoning capabilities to search the internet for recent information and generate comprehensive visual content from single prompts. The model features a knowledge cutoff of December 2025 and supports aspect ratios ranging from 3:1 wide to 1:3 tall.

Text Generation Breakthrough Addresses Historical Weakness

AI image generators have historically struggled with text rendering due to their diffusion-based architecture. TechCrunch reported that two years ago, asking DALL-E to create a Mexican restaurant menu would produce nonsensical items like “enchuita,” “churiros,” and “margartas.” ChatGPT Images 2.0 now generates restaurant menus that could be used commercially without customers detecting AI involvement.

“The diffusion models are reconstructing a given input,” Asmelash Teka Hadgu, founder and CEO of Lesan AI, told TechCrunch in 2024. “We can assume writings on an image are a very, very tiny part, so the image generator learns the patterns that cover more of these pixels.”

Researchers have explored autoregressive models as alternatives to diffusion approaches for image generation. These models function more like large language models by making predictions about visual content rather than reconstructing from noise. OpenAI declined to specify which architectural approach powers ChatGPT Images 2.0.

ComfyUI Raises $30M for Granular Creative Control

ComfyUI announced a $30 million funding round led by Craft Ventures, with participation from Pace Capital, Chemistry, and TruArrow. The startup, which began as an open-source project in 2023, provides creators with node-based workflows for controlling image, video, and audio outputs from diffusion models.

“If you think about your typical prompt-based solution, like Midjourney or ChatGPT, you ask for something, it [gets only] 60% – 80% there,” Yoland Yan, ComfyUI’s co-founder and CEO, told TechCrunch. “But to change that remaining 20%, you have to try this slot machine.”

Yan compared traditional prompting to casino gambling because small prompt modifications can completely overwrite previously satisfactory elements. ComfyUI’s interface allows creators to link specific generation components, providing granular control over final outputs without the unpredictability of prompt-based iteration.

Enterprise Adoption Accelerates

The startup previously raised $19 million in Series A financing from Chemistry Ventures, Cursor Capital, and Vercel founder Guillermo Rauch in late 2024. ComfyUI emerged when early diffusion models like Midjourney and DALL-E frequently produced obvious errors such as additional fingers on hands.

While current models have largely resolved basic anatomical mistakes, demand for precise creative control has expanded. Professional creators increasingly require deterministic workflows for commercial projects where prompt-based generation remains too unpredictable for production use.

Bias Mitigation Through Target-Based Prompting

Researchers have developed new approaches to address demographic representation issues in AI image generation. An arXiv paper proposes a lightweight framework that mitigates representational bias through prompt-level intervention without model retraining.

The research found that prompts like “doctor” or “CEO” frequently yield lighter-skinned outputs, while lower-status roles like “janitor” show more diversity, reinforcing societal stereotypes. The proposed solution allows users to select fairness specifications ranging from uniform distributions to complex definitions informed by large language models with source citations and confidence estimates.

Across 36 prompts spanning 30 occupations and 6 non-occupational contexts, the method successfully shifted skin-tone outcomes toward declared targets. This approach makes fairness interventions transparent, controllable, and usable at inference time without requiring specialized technical knowledge.

Enterprise AI Deployment Reaches 1,302 Use Cases

Google Cloud documented 1,302 real-world generative AI use cases from leading organizations, marking significant growth from 101 cases published two years ago. The expansion reflects widespread enterprise adoption of agentic AI systems built with tools like Gemini Enterprise, Gemini CLI, and Security Command Center.

The documentation covers production AI deployments across virtually every organization attending Google’s Next ’26 conference in Las Vegas. Google enlisted AI assistance to analyze the dataset, using Gemini Enterprise with the latest Gemini Pro models to identify notable trends and insights from the expanded use case collection.

What This Means

The simultaneous advancement of consumer-facing image generation and specialized creative tools indicates market maturation beyond early experimentation. ChatGPT Images 2.0’s text generation capabilities address a fundamental limitation that previously distinguished AI-generated content from human-created work.

ComfyUI’s $500 million valuation demonstrates investor confidence in tools that provide professional creators with production-grade control over AI outputs. As prompt-based generation remains probabilistic, deterministic workflows become essential for commercial applications requiring consistent results.

The bias mitigation research highlights growing awareness of ethical considerations in AI deployment. Inference-time solutions that don’t require model retraining make fairness interventions accessible to broader user bases, potentially accelerating responsible AI adoption.

FAQ

What makes ChatGPT Images 2.0 different from previous AI image generators?
ChatGPT Images 2.0 can generate accurate text within images and produce multiple related images from single prompts. It integrates with ChatGPT’s reasoning capabilities to access current internet information and supports flexible aspect ratios from 3:1 wide to 1:3 tall.

Why did ComfyUI raise funding at such a high valuation?
ComfyUI addresses the unpredictability of prompt-based image generation by providing node-based workflows that give creators granular control over every generation step. Professional users need deterministic results for commercial projects, making ComfyUI’s approach valuable for production workflows.

How can users address bias in AI image generation?
New research proposes prompt-level interventions that allow users to specify demographic representation targets without retraining models. These methods work at inference time and provide transparency about fairness definitions, making bias mitigation accessible to general users.

Sources

ComfyUI hits $500M valuation as creators seek more control over AI-generated media – TechCrunch
ChatGPT’s new Images 2.0 model is surprisingly good at generating text – TechCrunch