Incredible: ChatGPT's New 4o vs Gemini 2.0 Flash Experimental—Who Draws Best?
models
read at source ↗ natesnewsletter.substack.com
Incredible: ChatGPT’s New 4o vs Gemini 2.0 Flash Experimental—Who Draws Best?
Source: Nate’s Newsletter Date: 2025-03-26 URL: https://natesnewsletter.substack.com/p/incredible-chatgpts-new-4o-vs-gemini
Summary
A hands-on comparison of ChatGPT 4o’s image generation (using autoregressive generation) against Gemini 2.0 Flash Experimental across tasks including text-in-image rendering, object manipulation, fine detail work, and stylized imagery. The author begins skeptical and ends impressed, noting that 4o’s approach produces crisp text and strong prompt comprehension—capabilities that open up use cases previously impractical with diffusion-based generation.
Implications
- Feeds the frontier model capability thread: autoregressive image generation represents a meaningful architecture shift, not just an incremental quality bump—worth tracking as it diffuses to other providers.
- Relevant to multimodal tooling evaluation: if text-in-image fidelity is now reliable, workflows that previously required dedicated design tools for annotated visuals, diagrams, or mockups become candidates for LLM-native generation.
- The competitive framing (4o vs Flash) signals that image generation is now a first-tier differentiation axis between the major frontier providers, not a secondary feature.