4o Image Generation Hackernews Viewer

4o Image Generation

1072 points by meetpateltech 25 March 2025 | 599 comments

Comments

blixt 25 March 2025

What's important about this new type of image generation that's happening with tokens rather than with diffusion, is that this is effectively reasoning in pixel space.

Example: Ask it to draw a notepad with an empty tic-tac-toe, then tell it to make the first move, then you make a move, and so on.

You can also do very impressive information-conserving translations, such as changing the drawing style, but also stuff like "change day to night", or "put a hat on him", and so forth.

I get the feeling these models are quite restricted in resolution, and that more work in this space will let us do really wild things such as ask a model to create an app step by step first completely in images, essentially designing the whole app with text and all, then writing the code to reproduce it. And it also means that a model can take over from a really good diffusion model, so even if the original generations are not good, it can continue "reasoning" on an external image.

Finally, once these models become faster, you can imagine a truly generative UI, where the model produces the next frame of the app you are using based on events sent to the LLM (which can do all the normal things like using tools, thinking, etc). However, I also believe that diffusion models can do some of this, in a much faster way.

vunderba 26 March 2025

Ran through some of my relatively complex prompts combined with using pure text prompts as the de-facto means of making adjustments to the images (in contrast to using something like img2img / inpainting / etc.)

https://mordenstar.com/blog/chatgpt-4o-images

It's definitely impressive though once again fell flat on the ability to render a 9-pointed star.

M4v3R 25 March 2025

I’ve just tried it and oh wow it’s really good. I managed to create a birthday invitation card for my daughter in basically 1-shot, it nailed exactly the elements and style I wanted. Then I asked to retain everything but tweak the text to add more details about the date, venue etc. And it did. I’m in shock. Previous models would not be even halfway there.

kh_hk 25 March 2025

> Introducing 4o Image Generation: [...] our most advanced image generator yet

Then google:

> Gemini 2.5: Our most intelligent AI model

> Introducing Gemini 2.0 | Our most capable AI model yet

I could go on forever. I hope this trend dies and apple starts using something effective so all the other companies can start copying a new lexicon.

minimaxir 25 March 2025

OpenAI's livestream of GPT-4o Image Generation shows that it is slowwwwwwwwww (maybe 30 seconds per image, which Sam Altman had to spin "it's slow but the generated images are worth it"). Instead of using a diffusion approach, it appears to be generating the image tokens and decoding them akin to the original DALL-E (https://openai.com/index/dall-e/), which allows for streaming partial generations from top to bottom. In contrast, Google's Gemini can generate images and make edits in seconds.

No API yet, and given the slowness I imagine it will cost much more than the $0.03+/image of competitors.

user3939382 25 March 2025

I’ll just be happy with not everything having that over saturated cg/cartoon style that you cant prompt your way out of.

alach11 25 March 2025

It's incredible that this took 316 days to be released since it was initially announced. I do appreciate the emphasis in the presentation on how this can be useful beyond just being a cool/fun toy, as it seems most image generation tools have functioned.

Was anyone else surprised how slow the images were to generate in the livestream? This seems notably slower than DALLE.

lxgr 25 March 2025

Is there any way to see whether a given prompt was serviced by 4o or Dall-E?

Currently, my prompts seem to be going to the latter still, based on e.g. my source image being very obviously looped through a verbal image description and back to an image, compared to gemini-2.0-flash-exp-image-generation. A friend with a Plus plan has been getting responses from either.

The long-term plan seems to be to move to 4o completely and move Dall-E to its own tab, though, so maybe that problem will resolve itself before too long.

bb88 26 March 2025

My experience with these announcements is that they're cherry picking the best results from a maybe several hundred or a thousand prompts.

I'm not saying that it's not true, it's just "wait and see" before you take their word as gold.

I think MS's claim on their quantum computing breakthrough is the latest form of this.

ilaksh 26 March 2025

The new model in the drop down says something like "4o Create Image (Updated)". It is truly incredible. Far better than any other image generator as far as understanding and following complex prompts.

I was blown away when they showed this many months ago, and found it strange that more people weren't talking about it.

This is much more precise than the Gemini one that just came out recently.

gs17 25 March 2025

This is really impressive, but the "Best of 8" tag on a lot of them really makes me want to see how cherry-picked they are. My three free images had two impressive outputs and one failure.

aurareturn 26 March 2025

First AI image generator to pass the uncanny valley test? Seems like it. This is the biggest leap in image generation quality I've ever seen.

How much longer until an AI that can generate 30 frames with this quality and make a movie?

About 1.5 years ago, I thought AI would eventually allow anyone with an idea to make a Hollywood quality movie. Seems like we're not too far off. Maybe 2-3 more years?

alkonaut 26 March 2025

The whiteboard image is insane. Even if it took more than 8 to find it, it's really impressive.

To think that a few years ago we had dreamy pictures with eyes everywhere. And not long ago we were always identifying the AI images by the 6 fingered people.

I wonder how well the physics is modeled internally. E.g. if you prompt it to model some difficult ray tracing scenario (a box with a separating wall and a light in one of the chambers which leaks through to the other chamber etc)?

Or if you have a reflective chrome ball in your scene, how well does it understand that the image reflected must be an exact projection of the visible environment?

byearthithatius 25 March 2025

I remember literally just two or three years back getting good text was INSANE. We were all amazed when SD started making pretty good text.

sergiotapia 25 March 2025

am I dumb or every time they release something I can never find out how to actually use it and forget about it. take this for instance I wanted to try out their newton "an infographic explaining newton's prism experiment in great detail" example, but it generated a very bad result but maybe it's because I'm not using the right model? every release of theirs is not really a release, it's like a trailer. right?

RobinL 26 March 2025

It's very impressive. It feels like the text is a bit of a hack where they're somehow rendering the text separately and interpolating it into the image. Not always, I got it to render calligraphy with flourishes, but only for a handful of words.

For example, I asked it to render a few lines of text on a medieval scroll, and it basically looked like a picture of a gothic font written onto a background image of a scroll

nmilo 26 March 2025

Visual internet content is completely over. Pack it up

jfoster 25 March 2025

The character consistency and UI capabilities seem like they open up a lot of new use cases.

glooglork 26 March 2025

https://chatgpt.com/share/67e39ffa-3a98-8011-ab79-fe3ac76632...

Asking it to draw the Balkans map in Tolkien style, this is actually really impressive, geography is more or less completely correct, borders and country locations are wrong, but it feels like something I could get it to fix.

andrelaszlo 25 March 2025

I try this on every new generation:

Generate a photo of a lake taken by a mobile phone camera. No hands or phones in the photo, just the lake.

The hand holding a phone is always there :D

ibzsy 25 March 2025

Anyone else frightened by this? Seeing meant believing, and now that isnt the case anymore...

kerlue 26 March 2025

I enjoy trying to break these models. I come up with prompts that are uncommon but valid. I want to see how well they handle data not in their training set. For image generation I like to use “ Generate an image of a woman on vacation in the Caribbean, lying down on the beach without sunglasses, her eyes open.”

xnx 25 March 2025

Will be interesting to see how this ranks against Google Imagen and Reve. https://huggingface.co/spaces/ArtificialAnalysis/Text-to-Ima...

dumpsterdiver 26 March 2025

ChatGPT Pro tip: In addition to video generation, you can use this new image gen functionality in Sora and apply all of your custom templates to it! I generated this template (using my Sora Preset Generator, which I think is public) to test reasoning and coherency within the image:

Theme: Educational Scientific Visualization – Ultra Realistic Cutaways Color: Naturalistic palettes that reflect real-world materials (e.g., rocky grays, soil browns, fiery reds, translucent biological tones) with high contrast between layers for clarity Camera: High-resolution macro and sectional views using a tilt-shift camera for extreme detail; fixed side angles or dynamic isometric perspective to maximize spatial understanding Film Stock: Hyper-realistic digital rendering with photogrammetry textures and 8K fidelity, simulating studio-grade scientific documentation Lighting: Studio-quality three-point lighting with soft shadows and controlled specular highlights to reveal texture and depth without visual noise Vibe: Immersive and precise, evoking awe and fascination with the inner workings of complex systems; blends realism with didactic clarity Content Transformation: The input is transformed into a hyper-detailed, realistically textured cutaway model of a physical or biological structure—faithful to material properties and scale—enhanced for educational use with visual emphasis on internal mechanics, fluid systems, and spatial orientation

Examples: 1. A photorealistic geological cutaway of Earth showing crust, tectonic plates, mantle convection currents, and the liquid iron core with temperature gradients and seismic wave paths. 2. An ultra-detailed anatomical cross-section of the human torso revealing realistic organs, vasculature, muscular layers, and tissue textures in lifelike coloration. 3. A high-resolution cutaway of a jet engine mid-operation, displaying fuel flow, turbine rotation, air compression zones, and combustion chamber intricacies. 4. A hyper-realistic underground slice of a city showing subway lines, sewage systems, electrical conduits, geological strata, and building foundations. 5. A realistic cutaway of a honeybee hive with detailed comb structures, developing larvae, worker bee behavior zones, and active pollen storage processes.

carbocation 25 March 2025

This works great for many purposes.

One area where it does not work well at all is modifying photographs of people's faces.* Completely fumbles if you take a selfie and ask it to modify your shirt, for example.

* = unless the people are in the training set

KrazyButTrue 25 March 2025

Is it live yet? Have been trying it out and am still getting poor results on text generation.

megamix 26 March 2025

Are these models also based on copyrighted material? Can anyone provide a brief explanation if the datasets are scraped or CC super-public-images ?

Lerc 25 March 2025

I think the biggest problem I still see is the models awareness of the images it generated itself.

The glaring issue for the older image generators is how it would proudly proclaim to have presented an image with a description that has almost no relation to the image it actually provided.

I'm not sure if this update improves on this aspect. It may create the illusion of awareness of the picture by having better prompt adherence.

scarface_74 26 March 2025

Something crazy, the old model couldn’t draw a 5.25 drive. I tried this myself at the time.

https://news.ycombinator.com/item?id=42628742

The new one can.

https://chatgpt.com/share/67e36dee-6694-8010-b337-04f37eeb5c...

nycdatasci 25 March 2025

Here's an example of iterative editing with the new model: https://chatgpt.com/share/67e30f62-12f0-800f-b1d7-b3a9c61e99...

It's much better than prior models, but still generates hands with too many fingers, bodies with too many arms, etc.

qoez 25 March 2025

Looks about what you'd get with FLUX and attaching some language model to enhance your prompt with eg more text

akomtu 25 March 2025

The real test for image generators is the image->text->image conversion. In other words it should be able to describe an image with words and then use the words to recreate the original image with a high accuracy. The text representation of the image doesn't have to be English. It can be a program, e.g. a shader, that draws the image. I believe in 5-10 years it will be possible to give this tool a picture of rainforest, tell it to write a shader that draws this forest, and tell it to add Avatar-style flying rocks. Instead of these silly benchmarks, we'll read headlines like "GenAI 5.1 creates a 3D animation of a photograph of the Niagara falls in 3 seconds, less than 4KB of code that runs at 60fps".

rvz 25 March 2025

> ChatGPT’s new image generation in GPT‑4o rolls out starting today to Plus, Pro, Team, and Free users as the default image generator in ChatGPT, with access coming soon to Enterprise and Edu. For those who hold a special place in their hearts for DALL·E, it can still be accessed through a dedicated DALL·E GPT.

> Developers will soon be able to generate images with GPT‑4o via the API, with access rolling out in the next few weeks.

That's it folks. Tens of thousands of so-called "AI" image generator startups have been obliterated and taking digital artists with them all reduced to near zero.

Now you have a widely accessible meme generator with the name "ChatGPT".

The last task is for an open weight model that competes against this and is faster and all for free.

krishnasangeeth 26 March 2025

Really liked the fact that the team shared all the shortcomings of the model in the post. Sometimes products just highlights the best results and isn't forthcoming in areas that need improvement. Kudos to the OpenAI team on that.

n2d4 25 March 2025

For those who are still getting the old DALL-E images inside ChatGPT, you can access the new model on Sora: https://sora.com/explore/images

zjp 25 March 2025

Has the meaning of the words "available today" changed since I learned them?

TheAceOfHearts 25 March 2025

I wanted to use this to generate funny images of myself. Recently I was playing around with Gemini Image Generation to dress myself up as different things. Gemini Image Generation is surprisingly good, although the image quality quickly degrades as you add more changes. Nothing harmful, just silly things like dressing me up as a wizard or other typical RPG roles.

Trying out 4o image generation... It doesn't seem to support this use-case at all? I gave it an image of myself and asked to turn me into a wizard, and it generate something that doesn't look like me in the slightest. A second attempt, I asked to add a wizard hat and it just used python to add a triangle in the middle of my image. I looked at the examples and saw they had a direct image modification where they say "Give this cat a detective hat and a monocle", so I tried that with my own image "Give this human a detective hat and a monocle" and it just gave me this error:

> I wasn't able to generate the modified image because the request didn't follow our content policy. However, I can try another approach—either by applying a filter to stylize the image or guiding you on how to edit it using software like Photoshop or GIMP. Let me know what you'd like to do!

Overall, a very disappointing experience. As another point of comparison, Grok also added image generation capabilities and while the ability to edit existing images is a bit limited and janky, it still manages to overlay the requested transformation on top of the existing image.

elif 27 March 2025

Is anyone else getting wild rejections on content policy since this morning? I spent about 20 minutes trying to get it to turn my zoo photos into cartoons and could not get a single animal picture past the content moderation....

Even when I told it to transform it into a text description, then draw that text description, my earlier attempt at a cat picture meant that the description was too close to a banned image...

I can't help but feel like openAI and grok are on unhelpful polar opposites when it comes to moderation.

ant6n 26 March 2025

One very neat thing the interwebs are talking about is the ghiblification of family pictures. It’s actually pretty cute: https://x.com/grantslatton/status/1904631016356274286

In the coming days, people will Anime all sorts of images, for example historical images: https://x.com/keysmashbandit/status/1904764224636592188

planb 25 March 2025

To quote myself from a comment on sora:

Iterations are the missing link. With ChatGPT, you can iteratively improve text (e.g., "make it shorter," "mention xyz"). However, for pictures (and video), this functionality is not yet available. If you could prompt iteratively (e.g., "generate a red car in the sunset," "make it a muscle car," "place it on a hill," "show it from the side so the sun shines through the windshield"), the tools would become exponentially more useful.

I‘m looking forward to try this out and see if I was right. Unfortunately it’s not yet available for me.

ryanmcgarvey 26 March 2025

Still can't show me a clock that isn't 10:10.

Otherwise impressive.

prats226 26 March 2025

Is there a technical paper released about the model architecture? Great resolution points to diffusion style generation rather than just token based?

tracerbulletx 26 March 2025

Wow this works really well at editing existing photos.

gcanyon 25 March 2025

> we see the photographer's reflection

Am I the only one immediately looking past the amazing text generation, the excellent direction following, the wonderful reflection, and screaming inside my head, "That's not how reflection works!"

I know it's super nitpicky when it's so obviously a leap forward on multiple other metrics, but still, that reflection just ain't right.

elif 26 March 2025

Like it's predecessor this has most of its utility within the first response, and after that the quality rapidly degrades.

I think it is too biased to use heuristics discovered in the first response to apply the same level of compute to subsequent requests.

It makes me kind of want to rewrite an interface that builds appropriate context and starts new chats for every request issued..

danhds 25 March 2025

To avoid confusion, why not always use a general AI model upfront, then depending on the user's prompt, redirect it to a specific model?

mtillman 26 March 2025

The fact that it nailed the awkward engineer high five (image 2) is pretty impressive as someone who only gives awkward high fives.

kylehotchkiss 25 March 2025

They still all have a somewhat cold and sterile look to them. Probably that 1% the next decade will be spent working out.

afro88 25 March 2025

Edit: Please ignore. They hadn't rolled the new model out to my account yet. The announcement blog post is a bit misleading saying you can try it today.

Comparison with Leonardo.Ai.

ChatGPT: https://chatgpt.com/share/67e2fb21-a06c-8008-b297-07681dddee...

ChatGPT again (direct one shot): https://chatgpt.com/share/67e2fc44-ecc8-8008-a40f-e1368d306e...

ChatGPT again (using word "photorealistic instead of "photo"): https://chatgpt.com/share/67e2fce4-369c-8008-b69e-c2cbe0dd61...

Leonardo.Ai Phoenix 1.0 model: https://cdn.leonardo.ai/users/1f263899-3b36-4336-b2a5-d8bc25...

sashank_1509 25 March 2025

For the first time ever, it feels like it listens and actually tries to follow what I say. I managed to actually get a good photo of a dog in the beach with shoes, from a side angle, by consistently prompting it and making small changes from one image to another till I got my intended effect

ashvardanian 25 March 2025

The pre-recorded short videos are a much better form of presentation than live-streamed announcements!

DotSauce 27 March 2025

I created an app to generate image prompts specifically for 4o. Geared towards business and marketing. Any feedback is welcome. https://imageprompts.app/

computergert 26 March 2025

It does extremely well at creating images of copyrighted characters. Dall-e couldn't generate images of Miffy, this one can. Same for "Kikker en vriendjes" - a dutch children's book. There seems to be copyright protection at all?

wiradikusuma 26 March 2025

Just curious if it works for creating a comic strip? I.e. will it maintain the consistency of the characters? I watched a video somewhere they demo'ed it creating comic panels, but I want to create the panels one by one.

trekkie1024 25 March 2025

Interesting that in the second image the text on the whiteboard changes (top left)

Garlef 26 March 2025

> I wasn’t able to generate the image because the combination of abstract elements and stylistic blending [...] may have triggered content filters related to ambiguous or intense visuals.

nah. i pass and stick with midjourney.

krackers 25 March 2025

So what's the lore with why this took over a _year_ to launch from the first announcement. It's fairly clear that their hand was forced by Google quietly releasing this exact feature a few weeks back though.

mclau156 25 March 2025

I would love to see advancement in the pixel art space, specifying 64x64 pixels and attempting to make game-ready pixel art and even animations, or even taking a reference image and creating a 64x64 version

beardbandit 26 March 2025

The easy infographic generation scares me on the implications for society.

theptip 26 March 2025

It’s pretty good, the interesting thing is when it fails it seems to often be able to reason about what went wrong. So when we get CoT scaffolding for this it’ll be incredibly competent.

glonq 26 March 2025

So did they deprecate the ability to use DALL-E 3 to generate images? I asked the legacy ChatGPT 4 model to generate an image and it used the new 4o style image generator.

pton_xd 25 March 2025

Can you specify the output dimensions?

EDIT: Seems not, "The smallest image size I can generate is 1024x1024. Would you like me to proceed with that, or would you like a different approach?"

alex_young 26 March 2025

I tried a few of the prompts and the results I see are far worse than the examples provided. Seems like there will be some room for artists yet in this brave new world.

voidUpdate 26 March 2025

> All generated images come with C2PA metadata

How easy is this to remove? Is it just like exif data that can be easily stripped out, or is it baked in more permanently somehow

baltimore 25 March 2025

My version of the full glass of wine challenge is "clock face with 13 hour divisions". Nothing I've tried has been able to do it yet.

system2 26 March 2025

I am blown away by the hyperrealistic renderings, especially of humans. It is getting to the point where I can no longer distinguish ai ones.

nprateem 25 March 2025

The garbled text on these things always just makes them basically useless, especially it often text without being told to like previous models.

macleginn 25 March 2025

A real improvement, but it still drew me a door with a handle where the should be one and an extra knob on the side where hinges are.

mohitgangrade 26 March 2025

Can anyone tell me when this will be available in the API? Or is it already available?

I couldn't find anything on the pricing page.

shaky-carrousel 25 March 2025

Tried it, the "compise armporressed" and "Pros: made bord reqotons" didn't impress me in the slightest.

Gusarich 26 March 2025

It produces amazing results for me! But the wow effect would have been greater if they had released it a few months ago.

ksec 26 March 2025

Can someone explain what is going with 4o and Anime with Ghibli Style? Why is it suddenly all over x/twitter?

guybedo 26 March 2025

fwiw i've added a summary of this discussion here (https://extraakt.com/extraakts/gpt-4o-image-generation-discu...) to keep track of the main points

huijzer 26 March 2025

So Google released Gemini 2.5 and one hour later OpenAI comes with this. It’s almost childish at this point.

TheOtherHobbes 26 March 2025

The best thing about this is how the still of the livestream at the bottom is the most uncanny valley image.

miletus 26 March 2025

saw this thread on X. here are some incredible use cases of 4o image generation: https://x.com/0xmetaschool/status/1904804251148443873

amunozo 26 March 2025

Thankfully. It was outrageous how inferior DALL-E 3 was to any other image generation system.

DrNosferatu 25 March 2025

Could they have switched to *both* image and text generation via diffusion, without tokens?

joseneca 26 March 2025

It is amazing how far text generation in images has come over the past 1-2 years

freeopinion 25 March 2025

It bothers me to see links to content that requires a login. I don't expect openai or anyone else to give their services away for free. But I feel like "news" posts that require one to setup an account with a vendor are bad faith.

If the subject matter is paywalled, I feel that the post should include some explanation of what is newsworthy behind the link.

725686 25 March 2025

Not a criticism, but It stands out how all the researchers or employees in these videos are non native English speakers (i.e. not American). Nothing wrong with that, on the contrary, it just seems odd that the only American is Altman. Same thing with the last videos from Zuck, if I recall correctly. Especially in this Trump era of MAGA.

dawatchusay 26 March 2025

The reflections in the whiteboard are all off. Do they address this?

2OEH8eoCRo0 26 March 2025

Still can't generate an analog clock face with a given time.

krick 26 March 2025

Can it draw the notorious glass of wine filled to the brim yet?

tantaman 25 March 2025

Attention to every detail, even the awkward nerd high-five.

aantix 25 March 2025

Still seems to have problems with transparent backgrounds.

coherentpony 25 March 2025

> we’ve built our most advanced image generator yet into GPT‑4o. The result—image generation that is not only beautiful, but useful.

Sorry, but how are these useful? None of the examples demonstrate any use beyond being cool to look at.

The article vaguely mentions 'providing inspiration' as possible definition of 'useful'. I suppose.

cchance 26 March 2025

question i have is when do we get an opensource version of this form of image generation will we see diffusion models moving to this space

jashephe 25 March 2025

The periodic table poster under "High binding problems" is billed as evidence of model limitations, but I wonder if it just suggests that 4o is a fan of "Look Around You".

StefanBatory 26 March 2025

This technology should have never existed. Thank you OpenAI for being a contributing factor to destroying politics in future.

And I hope that people who worked on this know this. They are pure evil.

distalx 25 March 2025

I wish AI companies would release new things once a year, like at CES or how Apple does it. This constant stream of releases and announcements feels like it's just for attention.

bli940505 26 March 2025

Why won't they add benchmarks against o1?

HarshaNP 27 March 2025

someone try to make an open source version of it. I need to know the inner workings of this cool thing.

jofzar 25 March 2025

Still failing the wine glass test,

https://imgur.com/a/aS8e0UY

bbstats 25 March 2025

that "best of 8" is doing a lot of work. i put in the same input and the image is awful.

transitivebs 26 March 2025

literally spent all day playing with this until I ran out of image gen capacity a lil while ago.

so much fun.

ravedave5 25 March 2025

Everyone should try running their prompts and see how over hyped this is. The results I get are terrible comparatively.

batata_frita 26 March 2025

What is the api price?

t0lo 26 March 2025

This is incredibly impressive, but it's still theft of assets.

yus07 27 March 2025

drug repurposing for sarcopenia

yus07 27 March 2025

drug repositiong for sarcopenia

nbzso 26 March 2025

Where are the lawyers where you need them?

polotics 25 March 2025

well it failed on me, after many tries:

...Once the wait time is up, I can generate the corrected version with exactly eight characters: five mice, one elephant, one polar bear, and one giraffe in a green turtleneck. Let me know if you'd like me to try again later!

bbor 25 March 2025

Whelp. That's terrifying.

BigParm 25 March 2025

They say it must be an important OpenAI announcement when they bring out the twink.

resource_waste 25 March 2025

LPT: while the benchmarks don't show it, chatGPT4>4o. It amazes me people use 4o at all. But hey its the brand name and its free.

ofc 4.5 is best, but its slow and I am afraid I'm going to hit limits.

t0lo 26 March 2025

Similar to regular LLM plagarism, it's pretty obvious that visual artefacts like the loadout screen for the rpg cat (video game heading) which is inspired by diablo, aren't unique at all and just the result of other peoples efforts and livelihoods.

bongodongobob 26 March 2025

Garbage compared to Midjourney. I don't even know why you'd market this. It's takes a minute or more and the results are what I'd say Midjourney looked like 1.5 years ago.

occamschainsaw 25 March 2025

Did they time it with the Gemini 2.5 launch? https://news.ycombinator.com/item?id=43473489

Was it public information when Google was going to launch their new models? Interesting timing.