It finally feels like the professional tools have greatly outpaced the open source versions. While wan and hunyuan are solid free options, the latest from Google and Runway have started to feel like a league above. Interestingly it feels like the biggest differentiator is editing tools - ability to prompt motion, direction, cuts, or weaving in audio, rather than just pure ability to one shot.
These larger companies are clearly going after the agency/hollywood use cases. It'll be fascinating to see when they become the default rather than a niche option - that time seems to be drawing closer faster than anticipated. The results here are great, but they're still one or two generations off.
An indie film with poor production values, even bad acting can grip you, make you laugh and make you cry. The consistency of quality is key - even if it is poor. The directing is the red thread throughout the scenes. Anything with different quality levels interrupts your flow and breaks your experience.
The problem with AI video content at this stage is that the clips are very good 'in themselves', just as LLM results are, but putting them together to let you engage beyond an individual clip will not be possible for a long time.
It will work where the red thread is in the audio (e.g. a title sequence) and you put some clips together to support the thread. But Hollywood has nothing to fear at this stage. In addition, remember that visual artists are control freaks of the purest kind. Film is still used because of the grain, not despite it. 24p prevails.
As an artist and designer (with admittedly limited AI experience), where I feel AI to be lacking is in its poverty of support for formal descriptors. Content descriptors such as 'dog wearing a hat' are a mostly solved problem. Support for simple formal descriptors such as basic color terms and background/foreground are ok, but things like 'global contrast' (as opposed to foreground background contrast), 'negative shape', 'overlap', 'saturation contrast' etc etc... all these leave the AI models I have played with scratching their heads.
I like how Veo supports camera moves, though I wonder if it clearly recognizes the difference between 'in-camera motion' and 'camera motion' and also things like 'global motion' (e.g. the motion of rain, snow etc).
The abiding issue is that artists (animators, filmmakers etc) have not done an effective job at formalising these attributes or even naming them consistently. Every Frame a Painting does a good job but even he has a tendency to hand wave these attributes.
Wow, this is incredible work! Blown away at how well the audio/video matches up, and the dialogue is better sounding / on-par with dedicated voice models.
Google has partnered with Darren Aronofsky’s AI-Driven Studio Primordial Soup. I still don't understand why SAG-AFTRA's strike to ban AI from Hollywood studios didn't affect this new studio. Does anyone know?
This is technically impressive and I commend the team that brought it to life.
It makes me sad, though. I wish we were pushing AI more to automate non-creative work and not burying the creatives among us in a pile of AI generated content.
I tried Whisk to generate images which I then animated, thinking it would be using the newest model. But then I noticed that Veo 3 and Imagegen 4 are only usable through Flow, and only if you're on the most expensive plan. AI Studio also only shows Imagegen3 and Veo2 as media generating options.
My main issue when trying out Veo 2 was that it felt very static. A couple elements or details were animated, but it felt unnatural that most elements remained static. The Veo 3 demos lack any examples where various elements are animated into doing different things in the same shot, which suggests that it's not possible. Some of the example videos that I've seen are neat, but a tech demo isn't a product.
It would be really cool if Google contracted a bunch of artists / directors to spend like a week trying to make a couple videos or short movies to really showcase the product's functionality. I imagine that they don't do that because it would make the seams and limitations of their models a bit too apparent.
Finally, I have to complaint that Flow claims to not be available in Puerto Rico: "Flow is not available in your country yet." Despite being a US territory and being US citizens.
2. NeuralViz demonstrates AI videos (with a lot of human massaging) can be entertaining
To me the fundamental question is- "will AI make videos that are entertaining without human massaging?"
This is similar to the idea of "will AI make apps that are useful without human massaging"
Or "will AI create ideas that are influential without human massaging"
By "no human massaging", I mean completely autonomous. The only prompt being "Create".
I am unaware of any idea, app or video to date that has been influential, useful or entertaining without human massaging.
That doesn't mean it can't happen. It's fundamentally a technical question.
Right now AI is trained on human collected data. So, technically, It's hard for me to imagine it can diverge significantly from what's already been done.
I'm willing to be proven wrong.
The Christian in me tells me that Humans are able to diverge significantly from what's already been done because each of us are imbibed with a divine spirit that AI does not have.
But maybe AI could have some other property that allows it to diverge from its training data.
>>models create, empowering artists to bring their creative vision
Interesting logic the new era brings: something else creates, and you only "bring your vision to life", but what it means is left for readers questioning, your "vision" here is your text prompt?
Were at a crossroads where the tools are powerful enough to make the process optional.
That raises uncomfortable questions: if you don’t have to create anymore, will people still value the journey? Will vision alone be enough? What's the creative purpose in life? To create, or to to bring creative vision to life? Isn't the act of creation is being subtly redefined?
Older people on social networks are cooked. I mean in general, we are entering an age where making scams and spreading false news will be easily done with 10$ of credits.
>Imagen 4 is available today in the Gemini app, Whisk, Vertex AI and across Slides, Vids, Docs and more in Workspace.
I'm always hesitant with rollouts like this. If I go to one of these, there's no indication which Imagen version I'm getting results from. If I get an output that's underwhelming, how do I know whether it's the new model or if the rollout hasn't reached me yet?
Got a bit of an uncanny valley feeling with the owl and the old man videos. And the origami video give me a sort of sinister feeling, seemed vaguely threatening, agressive.
Can we talk about the elephant in the room, porn and i mean the weird and dangerous one? that moment in history of AI is going to happen and when it did shit will hit the fan.
I came across some online threads sharing LoRA models the other day - and it seemed that a lot of generative AI users seem to share models that are effectively just highly specialized fixed function filters for existing (generated)images?
The obvious aim of these foundational image/movie generation AI developments is for these to become the primary source of values at cost and quality unparalleled by preexisting human experts, while allowing but not necessitating further modifications by now heavily commoditized and devalued ex-professional editors at downstream to allow for their slow deprecation.
But the opposite seem to be happening: better data are still human generated, generators are increasingly human curated, and are used increasingly closer to the tail end of the pipeline instead of head. Which isn't so threatening nor interesting to me, but I do wonder if that's a safe, let alone expected, outcome for those pushing these developments.
Aren't you welding a nozzle onto open can of worms?
I'm surprised no one has yet to mention the use of the name "Flow", which is also the title of the 2025 Oscar winning animated movie, built using Blender. [1]
This naming seems very confusing, as I originally thought there must be some connection. But I don't think there is.
For anyone with an access, can you ask it to make a pickup truck drive through mud? I’ve tested various different AIs and they all suck with physics and tires spinning wrong way, it is just embarrassing. Demos look amazing, but when it comes to actual use - there is none that worked for me. I guess it is all to increase “investor value”
I think Google's got something going wrong with their usage limits, they're warning I'm about to hit my video limit after I gave two prompts. I have a Google AI Pro subscription (came free for 1 year with a phone) and I logged into Flow and provided exactly 2 prompts. Flow generated 2 videos per prompt, for a total of 4 videos, each ~8 seconds long. I then went to the gemini.google.com interface, selected the "Veo 2" model, and am now being told "You can generate 2 more videos today".
Since Google seems super cagey about what their exact limits actually are, even for paying customers, it's hard to know if that's an error or not. If it's not an error, if it's intentional, I don't understand how that's at all worth $20 a month. I'm literally trying to use your product Google, why won't you let me?
in the owl/badger video, the owl should fly silently.
This is an interesting non-trivial problem of generalization and world-knowledge etc., but also?
There's something somewhat sad about that slipping through; it makes me think, *no one involve in the production of this video, its selection, it passing review... etc., seemed to realize that it is one of the characteristic things about owls that you don't hear their wings.
We have owls on our hill right now and see them almost every day and regularly seem them fly. It's magic, especially in an urban environment.
Who is doing all the work of making physical agents that can behave as good as a UBI generator? Something that can not just create videos, but go get groceries(hell grow my food), help a construction worker lay down tiling, help a nurse fetch supplies.
https://www.figure.ai/ does not exist yet, at least not for the masses. Why are Meta and Google just building the next coder and not the next robot?
Its because those problem are at the bottom of the economic ladder. But they have the money for it and it would create so much abundance, it would crash the cost of living and free up human labor to imagine and do things more creatively than whatever Veo 4 can ever do.
Has anyone gotten access to Imagen 4 for image editing, inpaint/outpaint or using reference images yet? That's core to my workflow and their docs just lead to a google form. I've submitted but it feels like it's a bit of a black hole.
Why do people making AI image tools keep showing "pixel art" made with it when the tools are so obviously bad at making it? it's such a basic unforced error
Think of all of your favorite novels that are deemed "impossible" to adapt to the screen.
Or think of all the brilliant ideas for films that are destined to die in the minds of people who will never, ever have the luck or connections required to make it to Hollywood.
When this stuff truly matures and gets commoditized I think we are going to see an explosion of some of the most mind blowing art.
I think it's a good thing to have more people creating things. I also think it's a good thing to have to do some work and some thinking and planning to produce a work.
tbh, wasnt that impressed
maybe its cause social media has been heavily marketing out all these things in bulkkk
and moreover, at this point, it just feels one company copying what the other released, even the names feel not original?
Have they reveled anything similar to Claude Code yet? I sure hope they are saving that for I/O next month... this video/photo reveals are too gimmicky for my liking, alas I'm probably biased because I don't really have a use for them.
On a technical level, this is a great achievement.
On a more societal level, I'm not sure continuously diminishing costs for producing AI slop is a net benefit to humanity.
I think this whole thing parallels some of the social media pros and cons. We gained the chance to reconnect with long lost friends—from whom we probably drifted apart for real reasons, consciously or not—at the cost of letting the general level of discourse to tank to its current state thanks to engagement-maximizing algorithms.
Well all this is great from a technology point of view. But what about millions of jobs in the film industry in animation, motion artists etc? Why is it feeling like few humans are making sure others stop eating and living a good life?
Google hit the jackpot with their acquisition of YouTube and it's now paying dividend. YouTube is the largest single source of data and traffic on the Internet, and it's still growing fast. I think this data will prove incredibly important to robotics as well. It's a shame they sold Boston Dynamics in one of their dumbest ever moves because of bad PR.
Of course they had to name a film making proprietary tool with the name of an award winning film made using open-source tools released less than a year ago...
Like most AI image or video generation tools, they produce results that look good at first glance, but the more you watch, the more flaws and sloppiness you notice, and they really lack storytelling
I do find myself wondering if the people working on this stuff ever give any real thought to the impact on society that this is going to have.
I mean obviously the answer is "no" and this is going to get a bunch of replies saying that inventors are not to blame but the negative results of a technology like this are fairly obvious.
We had a movie two years ago about a blubbering scientist who blatantly ignored that to the detriment of his own mental health.
This doesn't look (any?) better than what was shown a year or two ago for the initial Sora release.
I imagine video is a far tougher thing to model, but it's kind of weird how all these models are incapable of not looking like AI generated content. They all are smooth and shiny and robotic, year after year its the same. If anything, the earlier generators like that horrifying "Will Smith eating spaghetti" generation from back like three years ago looks LESS robotic than any of the recent floaty clips that are generated now.
I'm sure it will get better, whatever, but unlike the goal of LLMs for code/writing where the primary concern is how correct the output is, video won't be accepted as easily without it NOT looking like AI.
I am starting to wonder if thats even possible since these are effectively making composite guesses based on training data and the outputs do ultimately look similar to those "Here is what the average American's face looks like, based on 1000 people's faces super-imposed onto each other" that used to show up on Reddit all the time. Uncanny, soft, and not particularly interesting.
Veo 3 and Imagen 4, and a new tool for filmmaking called Flow
(blog.google)815 points by youssefarizk 20 May 2025 | 524 comments
Comments
Created by Ari Kuschnir
https://genai-showdown.specr.net
These larger companies are clearly going after the agency/hollywood use cases. It'll be fascinating to see when they become the default rather than a niche option - that time seems to be drawing closer faster than anticipated. The results here are great, but they're still one or two generations off.
Its something that is only obvious when it is obvious. And the more obvious examples you see, the more non-obvious examples slip by.
I like how Veo supports camera moves, though I wonder if it clearly recognizes the difference between 'in-camera motion' and 'camera motion' and also things like 'global motion' (e.g. the motion of rain, snow etc).
Obligatory link to Every Frame a Painting, where he talks about motion in Kurosawa: https://www.youtube.com/watch?v=doaQC-S8de8
The abiding issue is that artists (animators, filmmakers etc) have not done an effective job at formalising these attributes or even naming them consistently. Every Frame a Painting does a good job but even he has a tendency to hand wave these attributes.
The pace is so crazy that was an over estimation! I'll probably get done in 2. Wild times.
0: https://www.linkedin.com/feed/update/urn:li:activity:7317975...
It makes me sad, though. I wish we were pushing AI more to automate non-creative work and not burying the creatives among us in a pile of AI generated content.
My last recollection is recent case said AI generated didn’t have copyright?
My main issue when trying out Veo 2 was that it felt very static. A couple elements or details were animated, but it felt unnatural that most elements remained static. The Veo 3 demos lack any examples where various elements are animated into doing different things in the same shot, which suggests that it's not possible. Some of the example videos that I've seen are neat, but a tech demo isn't a product.
It would be really cool if Google contracted a bunch of artists / directors to spend like a week trying to make a couple videos or short movies to really showcase the product's functionality. I imagine that they don't do that because it would make the seams and limitations of their models a bit too apparent.
Finally, I have to complaint that Flow claims to not be available in Puerto Rico: "Flow is not available in your country yet." Despite being a US territory and being US citizens.
The demo videos for Sora look amazing but using it is substantially more frustrating and hit and miss.
I’ve noticed ads with AI voices already, but having it lip synced with someone talking in a video really sells it more
1. People like to be entertained.
2. NeuralViz demonstrates AI videos (with a lot of human massaging) can be entertaining
To me the fundamental question is- "will AI make videos that are entertaining without human massaging?"
This is similar to the idea of "will AI make apps that are useful without human massaging"
Or "will AI create ideas that are influential without human massaging"
By "no human massaging", I mean completely autonomous. The only prompt being "Create".
I am unaware of any idea, app or video to date that has been influential, useful or entertaining without human massaging.
That doesn't mean it can't happen. It's fundamentally a technical question.
Right now AI is trained on human collected data. So, technically, It's hard for me to imagine it can diverge significantly from what's already been done.
I'm willing to be proven wrong.
The Christian in me tells me that Humans are able to diverge significantly from what's already been done because each of us are imbibed with a divine spirit that AI does not have.
But maybe AI could have some other property that allows it to diverge from its training data.
Interesting logic the new era brings: something else creates, and you only "bring your vision to life", but what it means is left for readers questioning, your "vision" here is your text prompt?
Were at a crossroads where the tools are powerful enough to make the process optional.
That raises uncomfortable questions: if you don’t have to create anymore, will people still value the journey? Will vision alone be enough? What's the creative purpose in life? To create, or to to bring creative vision to life? Isn't the act of creation is being subtly redefined?
I'm always hesitant with rollouts like this. If I go to one of these, there's no indication which Imagen version I'm getting results from. If I get an output that's underwhelming, how do I know whether it's the new model or if the rollout hasn't reached me yet?
Why is it that all these AI concept videos are completely crazy?
The obvious aim of these foundational image/movie generation AI developments is for these to become the primary source of values at cost and quality unparalleled by preexisting human experts, while allowing but not necessitating further modifications by now heavily commoditized and devalued ex-professional editors at downstream to allow for their slow deprecation.
But the opposite seem to be happening: better data are still human generated, generators are increasingly human curated, and are used increasingly closer to the tail end of the pipeline instead of head. Which isn't so threatening nor interesting to me, but I do wonder if that's a safe, let alone expected, outcome for those pushing these developments.
Aren't you welding a nozzle onto open can of worms?
This naming seems very confusing, as I originally thought there must be some connection. But I don't think there is.
[1] https://news.ycombinator.com/item?id=43237273
Since Google seems super cagey about what their exact limits actually are, even for paying customers, it's hard to know if that's an error or not. If it's not an error, if it's intentional, I don't understand how that's at all worth $20 a month. I'm literally trying to use your product Google, why won't you let me?
in the owl/badger video, the owl should fly silently.
This is an interesting non-trivial problem of generalization and world-knowledge etc., but also?
There's something somewhat sad about that slipping through; it makes me think, *no one involve in the production of this video, its selection, it passing review... etc., seemed to realize that it is one of the characteristic things about owls that you don't hear their wings.
We have owls on our hill right now and see them almost every day and regularly seem them fly. It's magic, especially in an urban environment.
Ideogram and gpt4o passes only a few, but not all of them.
https://www.figure.ai/ does not exist yet, at least not for the masses. Why are Meta and Google just building the next coder and not the next robot?
Its because those problem are at the bottom of the economic ladder. But they have the money for it and it would create so much abundance, it would crash the cost of living and free up human labor to imagine and do things more creatively than whatever Veo 4 can ever do.
The guy in the third video looks like a dressed up Ewan McGregor, anyone else see that?
I guess we can welcome even more quality 5 second clips for Shorts and Instagram
Soon, you should be able to put in a screenplay and a cast, and get a movie out. Then, "Google Sequels" - generates a sequel for any movie.
Think of all of your favorite novels that are deemed "impossible" to adapt to the screen.
Or think of all the brilliant ideas for films that are destined to die in the minds of people who will never, ever have the luck or connections required to make it to Hollywood.
When this stuff truly matures and gets commoditized I think we are going to see an explosion of some of the most mind blowing art.
On a more societal level, I'm not sure continuously diminishing costs for producing AI slop is a net benefit to humanity.
I think this whole thing parallels some of the social media pros and cons. We gained the chance to reconnect with long lost friends—from whom we probably drifted apart for real reasons, consciously or not—at the cost of letting the general level of discourse to tank to its current state thanks to engagement-maximizing algorithms.
Not in 10 years but now.
People who just see this as terrible are wrong. AI improving curves is exponential.
People adaptability is at best linear.
This makes me really sad. For creativity. For people.
Edit: https://labs.google/fx/tools/whisk
Can’t wait to see what people start making with these
Thank you, researchers, for making our world worse. Thank you for helping to kill democracy.
They all got smoked by Google with what they just announced.
Google what is this?
How would anyone use this for a commercial application.
I mean obviously the answer is "no" and this is going to get a bunch of replies saying that inventors are not to blame but the negative results of a technology like this are fairly obvious.
We had a movie two years ago about a blubbering scientist who blatantly ignored that to the detriment of his own mental health.
A bit depressing.
I cant be the only one wondering where the swedish beach volleyball channel is though.
I imagine video is a far tougher thing to model, but it's kind of weird how all these models are incapable of not looking like AI generated content. They all are smooth and shiny and robotic, year after year its the same. If anything, the earlier generators like that horrifying "Will Smith eating spaghetti" generation from back like three years ago looks LESS robotic than any of the recent floaty clips that are generated now.
I'm sure it will get better, whatever, but unlike the goal of LLMs for code/writing where the primary concern is how correct the output is, video won't be accepted as easily without it NOT looking like AI.
I am starting to wonder if thats even possible since these are effectively making composite guesses based on training data and the outputs do ultimately look similar to those "Here is what the average American's face looks like, based on 1000 people's faces super-imposed onto each other" that used to show up on Reddit all the time. Uncanny, soft, and not particularly interesting.