gpt-image-2 drops diffusion for transformers, tops image arena

OpenAI shipped ChatGPT Images 2.0 with the gpt-image-2 model, and it's already at the top of the LMArena text-to-image leaderboard. The model scored 1512 Elo, beating Google's Gemini 3.1 and OpenAI's own previous high-fidelity model. Users on Hacker News say it blows everything else out of the water.

The technical story here is the architecture shift. OpenAI abandoned the diffusion-based approach behind DALL-E 3 and moved to a transformer framework. Instead of generating images by reversing noise (how diffusion works), gpt-image-2 treats visual generation as a sequence prediction task, the same approach GPT uses for text. That's what gives the model better semantic alignment between prompts and output. It's why text rendering suddenly works.

You can now build infographics or posters with readable text right in the image. The editing tools handle background swaps and text changes. Outputs reach 4K resolution with flexible aspect ratios.

For agent builders, the consolidation is the real story. One transformer model handles both text and image tokens with shared processing logic. You don't need separate APIs with different architectures anymore.