State-of-the-art video and image generation with Veo 2 and Imagen 3
read at source ↗ deepmind.google
State-of-the-art video and image generation with Veo 2 and Imagen 3
Source: DeepMind Date: 2024-12-16 URL: https://deepmind.google/blog/state-of-the-art-video-and-image-generation-with-veo-2-and-imagen-3/
Summary
Google launched Veo 2 (4K resolution, minutes-long video, improved physics simulation, human motion, and cinematic language understanding, with fewer hallucinations than Veo 1) and Imagen 3 (2K images with brighter colors, wider style diversity, better prompt adherence). Whisk, a new creative tool, combines Imagen 3 + Gemini for image-based prompting — using reference images rather than text. All three were made available across 100+ countries via VideoFX and ImageFX.
Implications
Veo 2’s cinematic language understanding is the differentiating claim over competitors. “Rack focus,” “Dutch angle,” “steadicam” — if Veo 2 genuinely interprets cinematography vocabulary, it’s targeting professional creative workflows, not consumer novelty. That’s a different market from Sora (cinematic by aesthetic) or Runway (by iteration). The hallucination reduction claim matters here: visual errors break professional workflows in ways they don’t break consumer ones.
Whisk’s image-as-prompt interface is the interaction paradigm to watch. Text-to-image assumes users can describe what they want. Most users can’t — they know what they like when they see it. Image-based prompting solves the vocabulary problem by making the reference image the spec. If Whisk’s quality holds at scale, it normalizes image-in/image-out as the default creative workflow, which changes what “prompting skill” means.
4K and minutes-long video from a consumer tool in December 2024 sets the cost floor for video generation. Anything below Veo 2’s quality at higher price becomes uncompetitive. The 100+ country availability via free tools (VideoFX) means Google is aggressively capturing usage data and creative inventory to train future generations — the same strategy as Search.
Watch:
- How Veo 2 cinematic quality holds across diverse genres and lighting conditions in independent creator tests — the “improved physics” claim especially needs adversarial testing
- Whisk’s evolution from creative novelty to professional workflow tool — watch for API availability and resolution increases
- Competitor response: Sora’s rollout timing, Runway’s Gen-4 positioning, and Kling’s international expansion all accelerated after this launch