Veo 3 and Imagen 4, and a new tool for filmmaking called Flow

(blog.google)

Comments

oliwary 21 May 2025
This demo video of Veo 3 on reddit, featuring a variety of characters talking in different scenarios and accents, is one of the most incredible AI demos I have ever seen: https://www.reddit.com/r/ChatGPT/comments/1krmsns/wtf_ai_vid...

Created by Ari Kuschnir

vunderba 20 May 2025
After doing some testing, Imagen 4 doesn't score any higher than Imagen 3 on my comparison chart, approximately ~60% prompt adherence accuracy.

https://genai-showdown.specr.net

jjcm 20 May 2025
It finally feels like the professional tools have greatly outpaced the open source versions. While wan and hunyuan are solid free options, the latest from Google and Runway have started to feel like a league above. Interestingly it feels like the biggest differentiator is editing tools - ability to prompt motion, direction, cuts, or weaving in audio, rather than just pure ability to one shot.

These larger companies are clearly going after the agency/hollywood use cases. It'll be fascinating to see when they become the default rather than a niche option - that time seems to be drawing closer faster than anticipated. The results here are great, but they're still one or two generations off.

julianpye 20 May 2025
An indie film with poor production values, even bad acting can grip you, make you laugh and make you cry. The consistency of quality is key - even if it is poor. The directing is the red thread throughout the scenes. Anything with different quality levels interrupts your flow and breaks your experience. The problem with AI video content at this stage is that the clips are very good 'in themselves', just as LLM results are, but putting them together to let you engage beyond an individual clip will not be possible for a long time. It will work where the red thread is in the audio (e.g. a title sequence) and you put some clips together to support the thread. But Hollywood has nothing to fear at this stage. In addition, remember that visual artists are control freaks of the purest kind. Film is still used because of the grain, not despite it. 24p prevails.
Workaccount2 20 May 2025
I'm sure by this point, and if not, pretty soon, everyone will have seen a clip of AI generated video and not thought twice about it.

Its something that is only obvious when it is obvious. And the more obvious examples you see, the more non-obvious examples slip by.

Daub 21 May 2025
As an artist and designer (with admittedly limited AI experience), where I feel AI to be lacking is in its poverty of support for formal descriptors. Content descriptors such as 'dog wearing a hat' are a mostly solved problem. Support for simple formal descriptors such as basic color terms and background/foreground are ok, but things like 'global contrast' (as opposed to foreground background contrast), 'negative shape', 'overlap', 'saturation contrast' etc etc... all these leave the AI models I have played with scratching their heads.

I like how Veo supports camera moves, though I wonder if it clearly recognizes the difference between 'in-camera motion' and 'camera motion' and also things like 'global motion' (e.g. the motion of rain, snow etc).

Obligatory link to Every Frame a Painting, where he talks about motion in Kurosawa: https://www.youtube.com/watch?v=doaQC-S8de8

The abiding issue is that artists (animators, filmmakers etc) have not done an effective job at formalising these attributes or even naming them consistently. Every Frame a Painting does a good job but even he has a tendency to hand wave these attributes.

carlosdp 20 May 2025
Wow, this is incredible work! Blown away at how well the audio/video matches up, and the dialogue is better sounding / on-par with dedicated voice models.
anilgulecha 21 May 2025
I'd made a prediction/bet a month ago, predicting 6 months to a full 90 minute movie by someone sitting on their computer. [0]

The pace is so crazy that was an over estimation! I'll probably get done in 2. Wild times.

0: https://www.linkedin.com/feed/update/urn:li:activity:7317975...

kapildev 20 May 2025
Google has partnered with Darren Aronofsky’s AI-Driven Studio Primordial Soup. I still don't understand why SAG-AFTRA's strike to ban AI from Hollywood studios didn't affect this new studio. Does anyone know?
nrjames 20 May 2025
This is technically impressive and I commend the team that brought it to life.

It makes me sad, though. I wish we were pushing AI more to automate non-creative work and not burying the creatives among us in a pile of AI generated content.

wingspar 21 May 2025
So what the copyright situation going to be in an ai generated movie?

My last recollection is recent case said AI generated didn’t have copyright?

TheAceOfHearts 21 May 2025
I tried Whisk to generate images which I then animated, thinking it would be using the newest model. But then I noticed that Veo 3 and Imagegen 4 are only usable through Flow, and only if you're on the most expensive plan. AI Studio also only shows Imagegen3 and Veo2 as media generating options.

My main issue when trying out Veo 2 was that it felt very static. A couple elements or details were animated, but it felt unnatural that most elements remained static. The Veo 3 demos lack any examples where various elements are animated into doing different things in the same shot, which suggests that it's not possible. Some of the example videos that I've seen are neat, but a tech demo isn't a product.

It would be really cool if Google contracted a bunch of artists / directors to spend like a week trying to make a couple videos or short movies to really showcase the product's functionality. I imagine that they don't do that because it would make the seams and limitations of their models a bit too apparent.

Finally, I have to complaint that Flow claims to not be available in Puerto Rico: "Flow is not available in your country yet." Despite being a US territory and being US citizens.

jonplackett 20 May 2025
Has anyone actually tried Veo3 and know if it’s as good as this looks?

The demo videos for Sora look amazing but using it is substantially more frustrating and hit and miss.

arduinomancer 21 May 2025
I can definitely see this being used for lower end advertising

I’ve noticed ads with AI voices already, but having it lip synced with someone talking in a video really sells it more

cynicalpeace 21 May 2025
Basic principles:

1. People like to be entertained.

2. NeuralViz demonstrates AI videos (with a lot of human massaging) can be entertaining

To me the fundamental question is- "will AI make videos that are entertaining without human massaging?"

This is similar to the idea of "will AI make apps that are useful without human massaging"

Or "will AI create ideas that are influential without human massaging"

By "no human massaging", I mean completely autonomous. The only prompt being "Create".

I am unaware of any idea, app or video to date that has been influential, useful or entertaining without human massaging.

That doesn't mean it can't happen. It's fundamentally a technical question.

Right now AI is trained on human collected data. So, technically, It's hard for me to imagine it can diverge significantly from what's already been done.

I'm willing to be proven wrong.

The Christian in me tells me that Humans are able to diverge significantly from what's already been done because each of us are imbibed with a divine spirit that AI does not have.

But maybe AI could have some other property that allows it to diverge from its training data.

gloosx 20 May 2025
>>models create, empowering artists to bring their creative vision

Interesting logic the new era brings: something else creates, and you only "bring your vision to life", but what it means is left for readers questioning, your "vision" here is your text prompt?

Were at a crossroads where the tools are powerful enough to make the process optional.

That raises uncomfortable questions: if you don’t have to create anymore, will people still value the journey? Will vision alone be enough? What's the creative purpose in life? To create, or to to bring creative vision to life? Isn't the act of creation is being subtly redefined?

ssijak 21 May 2025
Older people on social networks are cooked. I mean in general, we are entering an age where making scams and spreading false news will be easily done with 10$ of credits.
Imnimo 20 May 2025
>Imagen 4 is available today in the Gemini app, Whisk, Vertex AI and across Slides, Vids, Docs and more in Workspace.

I'm always hesitant with rollouts like this. If I go to one of these, there's no indication which Imagen version I'm getting results from. If I get an output that's underwhelming, how do I know whether it's the new model or if the rollout hasn't reached me yet?

elzbardico 20 May 2025
Got a bit of an uncanny valley feeling with the owl and the old man videos. And the origami video give me a sort of sinister feeling, seemed vaguely threatening, agressive.
sech8420 21 May 2025
First test is... very confusing - https://x.com/Seancheno/status/1925049073230372980
afroboy 21 May 2025
Can we talk about the elephant in the room, porn and i mean the weird and dangerous one? that moment in history of AI is going to happen and when it did shit will hit the fan.
baxtr 21 May 2025
A whale coming out of the street in Manhattan, a women with a Jellyfish belly walking in the woods.

Why is it that all these AI concept videos are completely crazy?

numpad0 21 May 2025
I came across some online threads sharing LoRA models the other day - and it seemed that a lot of generative AI users seem to share models that are effectively just highly specialized fixed function filters for existing (generated)images?

The obvious aim of these foundational image/movie generation AI developments is for these to become the primary source of values at cost and quality unparalleled by preexisting human experts, while allowing but not necessitating further modifications by now heavily commoditized and devalued ex-professional editors at downstream to allow for their slow deprecation.

But the opposite seem to be happening: better data are still human generated, generators are increasingly human curated, and are used increasingly closer to the tail end of the pipeline instead of head. Which isn't so threatening nor interesting to me, but I do wonder if that's a safe, let alone expected, outcome for those pushing these developments.

Aren't you welding a nozzle onto open can of worms?

jader201 20 May 2025
I'm surprised no one has yet to mention the use of the name "Flow", which is also the title of the 2025 Oscar winning animated movie, built using Blender. [1]

This naming seems very confusing, as I originally thought there must be some connection. But I don't think there is.

[1] https://news.ycombinator.com/item?id=43237273

cryptoegorophy 20 May 2025
For anyone with an access, can you ask it to make a pickup truck drive through mud? I’ve tested various different AIs and they all suck with physics and tires spinning wrong way, it is just embarrassing. Demos look amazing, but when it comes to actual use - there is none that worked for me. I guess it is all to increase “investor value”
lelandbatey 20 May 2025
I think Google's got something going wrong with their usage limits, they're warning I'm about to hit my video limit after I gave two prompts. I have a Google AI Pro subscription (came free for 1 year with a phone) and I logged into Flow and provided exactly 2 prompts. Flow generated 2 videos per prompt, for a total of 4 videos, each ~8 seconds long. I then went to the gemini.google.com interface, selected the "Veo 2" model, and am now being told "You can generate 2 more videos today".

Since Google seems super cagey about what their exact limits actually are, even for paying customers, it's hard to know if that's an error or not. If it's not an error, if it's intentional, I don't understand how that's at all worth $20 a month. I'm literally trying to use your product Google, why won't you let me?

aaroninsf 21 May 2025
Funny but also illustrative issue:

in the owl/badger video, the owl should fly silently.

This is an interesting non-trivial problem of generalization and world-knowledge etc., but also?

There's something somewhat sad about that slipping through; it makes me think, *no one involve in the production of this video, its selection, it passing review... etc., seemed to realize that it is one of the characteristic things about owls that you don't hear their wings.

We have owls on our hill right now and see them almost every day and regularly seem them fly. It's magic, especially in an urban environment.

tianshuo 21 May 2025
Feel free to test imagen 4 on this benchmark: https://github.com/tianshuo/Impossible-AIGC-Benchmark

Ideogram and gpt4o passes only a few, but not all of them.

itissid 20 May 2025
Who is doing all the work of making physical agents that can behave as good as a UBI generator? Something that can not just create videos, but go get groceries(hell grow my food), help a construction worker lay down tiling, help a nurse fetch supplies.

https://www.figure.ai/ does not exist yet, at least not for the masses. Why are Meta and Google just building the next coder and not the next robot?

Its because those problem are at the bottom of the economic ladder. But they have the money for it and it would create so much abundance, it would crash the cost of living and free up human labor to imagine and do things more creatively than whatever Veo 4 can ever do.

ericskiff 20 May 2025
Has anyone gotten access to Imagen 4 for image editing, inpaint/outpaint or using reference images yet? That's core to my workflow and their docs just lead to a google form. I've submitted but it feels like it's a bit of a black hole.
ravenical 21 May 2025
Why do people making AI image tools keep showing "pixel art" made with it when the tools are so obviously bad at making it? it's such a basic unforced error
curvaturearth 20 May 2025
The first video is problematic? the owl faces forwards then seamlessly turns around - something is very off there.

The guy in the third video looks like a dressed up Ewan McGregor, anyone else see that?

I guess we can welcome even more quality 5 second clips for Shorts and Instagram

Animats 20 May 2025
The ad for Flow would be much better if they laid off the swirly and wavy effects, and focused on realism.

Soon, you should be able to put in a screenplay and a cast, and get a movie out. Then, "Google Sequels" - generates a sequel for any movie.

skc 21 May 2025
I'm excited about this.

Think of all of your favorite novels that are deemed "impossible" to adapt to the screen.

Or think of all the brilliant ideas for films that are destined to die in the minds of people who will never, ever have the luck or connections required to make it to Hollywood.

When this stuff truly matures and gets commoditized I think we are going to see an explosion of some of the most mind blowing art.

brm 20 May 2025
I think it's a good thing to have more people creating things. I also think it's a good thing to have to do some work and some thinking and planning to produce a work.
IncreasePosts 20 May 2025
I don't care about AI animals but the old salt offended me.
onlyreal_1 21 May 2025
tbh, wasnt that impressed maybe its cause social media has been heavily marketing out all these things in bulkkk and moreover, at this point, it just feels one company copying what the other released, even the names feel not original?
sergiotapia 20 May 2025
How do you use Imagen 4 in Gemini? I don't see it in the model picker, I just 2.5 Flash and 2.5 Pro (Upgrade).
pelagicAustral 20 May 2025
Have they reveled anything similar to Claude Code yet? I sure hope they are saving that for I/O next month... this video/photo reveals are too gimmicky for my liking, alas I'm probably biased because I don't really have a use for them.
airstrike 20 May 2025
On a technical level, this is a great achievement.

On a more societal level, I'm not sure continuously diminishing costs for producing AI slop is a net benefit to humanity.

I think this whole thing parallels some of the social media pros and cons. We gained the chance to reconnect with long lost friends—from whom we probably drifted apart for real reasons, consciously or not—at the cost of letting the general level of discourse to tank to its current state thanks to engagement-maximizing algorithms.

sebau 20 May 2025
Future is not bright. While we are endlessly talking about details reality is that AI is taken over so many jobs.

Not in 10 years but now.

People who just see this as terrible are wrong. AI improving curves is exponential.

People adaptability is at best linear.

This makes me really sad. For creativity. For people.

skybrian 20 May 2025
What’s the easiest way to try out Imagen 4?

Edit: https://labs.google/fx/tools/whisk

celespider 21 May 2025
I have some base knowledge about diffusion/dit, I am so curious about how this can be done. Do you know some resources in this field? THANKS!
nprateem 21 May 2025
Stability is conspicuously absent from the imagen benchmarks. I assume that means it's significantly better
pier25 20 May 2025
what do they use to train these models? youtube videos?
nico 20 May 2025
Wow, the audio integrations really makes a huge difference, especially given it does both sounds and voices

Can’t wait to see what people start making with these

flakiness 20 May 2025
How does this compare with sora (pro)?
methuselah_in 21 May 2025
Well all this is great from a technology point of view. But what about millions of jobs in the film industry in animation, motion artists etc? Why is it feeling like few humans are making sure others stop eating and living a good life?
ugh123 20 May 2025
When can I change the camera view and have everything stay consistent?
StefanBatory 20 May 2025
Thanks to them, we will be able to enter new era of politics. Where nothing is true, and everything is vibe based.

Thank you, researchers, for making our world worse. Thank you for helping to kill democracy.

bowsamic 20 May 2025
I'm surprised at how bad these are
kumarm 20 May 2025
All my Veo 3 videos has sound missing. No idea why. Seems like a common problem.
rvz 20 May 2025
Well, all the AI labs wanted to "Feel the AGI" and the smoke from Google...

They all got smoked by Google with what they just announced.

clarkcharlie03 21 May 2025
Google's been coooooking
impalallama 21 May 2025
Well this is terrifying
htrp 20 May 2025
is it still a waitlist?
999900000999 20 May 2025
Ehh, really for 20$. Break dancers with no music, people just pop in and out ?

Google what is this?

How would anyone use this for a commercial application.

matthewaveryusa 20 May 2025
"The Bloomberg terminal for creatives"
_ncuy 21 May 2025
Google hit the jackpot with their acquisition of YouTube and it's now paying dividend. YouTube is the largest single source of data and traffic on the Internet, and it's still growing fast. I think this data will prove incredibly important to robotics as well. It's a shame they sold Boston Dynamics in one of their dumbest ever moves because of bad PR.
phh 20 May 2025
Of course they had to name a film making proprietary tool with the name of an award winning film made using open-source tools released less than a year ago...
quantumHazer 20 May 2025
Like most AI image or video generation tools, they produce results that look good at first glance, but the more you watch, the more flaws and sloppiness you notice, and they really lack storytelling
lenerdenator 20 May 2025
I do find myself wondering if the people working on this stuff ever give any real thought to the impact on society that this is going to have.

I mean obviously the answer is "no" and this is going to get a bunch of replies saying that inventors are not to blame but the negative results of a technology like this are fairly obvious.

We had a movie two years ago about a blubbering scientist who blatantly ignored that to the detriment of his own mental health.

Lucasoato 20 May 2025
> Flow is not available in your country yet.

A bit depressing.

ionwake 20 May 2025
Love flow tv ! Absolutely blown away by the improvements on these models, and also the channel interface was not bad and quite smooth.

I cant be the only one wondering where the swedish beach volleyball channel is though.

crat3r 20 May 2025
This doesn't look (any?) better than what was shown a year or two ago for the initial Sora release.

I imagine video is a far tougher thing to model, but it's kind of weird how all these models are incapable of not looking like AI generated content. They all are smooth and shiny and robotic, year after year its the same. If anything, the earlier generators like that horrifying "Will Smith eating spaghetti" generation from back like three years ago looks LESS robotic than any of the recent floaty clips that are generated now.

I'm sure it will get better, whatever, but unlike the goal of LLMs for code/writing where the primary concern is how correct the output is, video won't be accepted as easily without it NOT looking like AI.

I am starting to wonder if thats even possible since these are effectively making composite guesses based on training data and the outputs do ultimately look similar to those "Here is what the average American's face looks like, based on 1000 people's faces super-imposed onto each other" that used to show up on Reddit all the time. Uncanny, soft, and not particularly interesting.