OpenAI Launches ChatGPT Images 2.0 — First Image Model That 'Thinks' Before It Draws

OpenAI released ChatGPT Images 2.0 on April 21, 2026, rolling out its next-generation image model, gpt-image-2, across ChatGPT, Codex, and the API. The launch marks the first time a production image generator integrates an explicit reasoning phase — the model "thinks" about composition, layout, and intent before rendering a single pixel.
The announcement on X has drawn more than 33,000 posts within 24 hours, with developers praising the model's ability to render small text and dense UI elements. OpenAI will retire both DALL-E 2 and DALL-E 3 on May 12, closing the chapter on the first generation of its diffusion-based image tools.
Key Highlights
- New gpt-image-2 model integrates a "thinking" mode with three tiers: low, medium, and high, trading latency for layout accuracy
- 2K resolution output (2,000 pixels on the long edge), nearly double the 1,024 px ceiling of gpt-image-1
- Generates up to 8 to 10 coherent images in a single request with character and object continuity
- Multilingual text rendering for Japanese, Korean, Chinese, Hindi, and Bengali
- DALL-E 2 and DALL-E 3 retirement scheduled for May 12, 2026
Thinking Before Drawing
In a shift that mirrors the reasoning revolution in text models, gpt-image-2 introduces a planning layer inside the image generation pipeline. Instead of mapping a prompt directly to a diffusion output, the model first reasons about what it needs to draw — sketching constraints, choosing a composition, and, when enabled, running web searches mid-generation to verify facts or references.
"Images are a language, not decoration," OpenAI wrote in its launch post. The thinking flag is available in three effort levels, giving developers control over how much reasoning latency they want to trade for layout precision.
Technical Specs and Pricing
The API exposes a familiar token-based pricing model:
- Input text tokens: $5 per million
- Output text tokens: $10 per million
- Input image tokens: $8 per million
- Output image tokens: $30 per million
- A standard 1024×1024 high-quality render lands at roughly $0.21 per image
Thinking mode incurs additional reasoning-token charges. Supported aspect ratios span 1:1, 3:2, 2:3, 16:9, 9:16, and the ultra-wide extremes 3:1 and 1:3.
Availability Tiers
Free ChatGPT users get the base gpt-image-2 model. ChatGPT Plus, Pro, and Business subscribers unlock thinking mode, longer reasoning runs, and web search inside generation. The model is simultaneously available through the Codex environment and the public API, a distribution pattern OpenAI has used increasingly to push product and developer access in parallel.
Impact on the Creative Stack
Early reactions from developers highlight two capabilities that break open new workflows: multilingual typography and multi-image consistency. A Thai developer reported the model produced readable Thai slides on the first try — a well-known failure mode for prior diffusion systems. Others noted the ability to generate a character and keep it visually consistent across a sequence of up to 10 images, a feature that reduces the need for ControlNet-style workarounds in marketing, e-commerce, and comic production.
Sam Altman's team is also positioning the release as competitive pressure on Midjourney, Stability, and Google's Imagen line, all of which have leaned on diffusion without an explicit reasoning loop.
Background
OpenAI's image tooling began with DALL-E in 2021 and evolved through DALL-E 2, DALL-E 3, and gpt-image-1. Each generation added fidelity, but none exposed a reasoning step. The move toward integrated thinking mirrors what OpenAI did with o1 and GPT-5 on the text side: treat inference-time compute as a lever for quality, not just a cost.
Analysts writing on Startup Fortune framed the launch as "raising the ceiling on generative complexity and forcing rivals to respond." The New Stack's Darryl K. Taft called it the moment "OpenAI now thinks before it draws."
What's Next
With DALL-E 2 and DALL-E 3 winding down in three weeks, teams using the older endpoints will need to migrate before May 12. OpenAI has signaled that edit-style endpoints with image and mask inputs are expected to follow the same reasoning pattern in a future update. Expect video and audio modalities to inherit the same "thinking" primitive in the coming months, completing the shift toward a unified reasoning-first architecture across all generative surfaces.
Discuss Your Project with Us
We're here to help with your web development needs. Schedule a call to discuss your project and how we can assist you.
Let's find the best solutions for your needs.