I think the utility of generating vectors is far, far greater than all the raster generation that's been a big focus thus far (DALL-E, Midjourney, etc). Those efforts have been incredibly impressive, of course, but raster outputs are so much more difficult to work with. You're forced to "upscale" or "inpaint" the rasters using subsequent generative AI calls to actually iterate towards something useful.
By contrast, generated vectors are inherently scalable and easy to edit. These outputs in particular seem to be low-complexity, with each shape composed of as few points as possible. This is a boon for "human-in-the-loop" editing experiences.
When it comes to generative visuals, creating simplified representations is much harder (and, IMO, more valuable) than creating highly intricate, messy representations.
I am a huge fan of this type of incremental generative approach. Language isn’t precise enough to describe a final product, so generating intermediate steps is very powerful.
I’d also like to see this in music generation. Tools like Suno are cool but I would much rather have something that generates MIDIs and instrument configurations instead.
Maybe this is a good lesson for generative tools. It’s possible to generate something that’s a good starting point. But what people actually want is long tail, so including the capability of precision modification is the difference between a canned demo and a powerful tool.
> Code coming soon
The examples are quite nice but I have no idea how reproducible they are.
I’ve always thought that generation of intermediate representations was the way to go. Instead of generating concrete syntax, generate AST. Instead of generating PNG, generate SVG. Instead of generating a succession of images for animation, generate wire frame or rigging plus script.
Once you have your IR, modify and render. Once you have your render, apply a final coat of AI pixie dust.
Maybe generative models will get so powerful that fine-grained control can be achieved through natural language. But until then, this method would have the advantages of controllability, interoperability with existing tools (like Intellisense, image editors), and probably smaller, cheaper models that don’t have to accommodate high dimensional pixel space.
I has to convert a bitmask to svg and was wishing to skip the intermediatary step so looked around for papers about segmentation models outputting svg and found this one https://arxiv.org/abs/2311.05276
I wonder if you can use an existing svg as a starting point. I would love to use the sketch approach and generate frame-by-frame animations to plot with my pen plotter.
If you can generate an image you can flatten it and if you can flatten it you can cluster it, and if you can cluster the flat sections you can draw vectors around them.
Seriously though, this is amazing, I'm glad to see this tackled directly.
Also, I just learned from this thread that Claude is apparently usable for generating SVGs (unlike e.g. GPT-4 when I tested for it some months ago), so I'll play with that while waiting for NeuralSVG to become available.
This is a group applying vector generation to animations: https://www.youtube.com/@studyturtlehq
The graphic fidelity has been slowly improving over time.
Prompting Claude to make SVGs then dropping them into Inkscape and getting the last ~20% of it to match the picture in my head has been a phenomenal user experience for me. This, too, piques my curiosity..!
Aside:
I've been having a very hard time prompting ChatGPT to spit out ASCII art. It really seems to not be able to do it.
Here is an ASCII art representation of a hopping rabbit:
```
(\(\
( -.-)
o_(")(")
```
This is a simple representation of a rabbit with its ears up and in a hopping stance. Let me know if you'd like me to adjust it!
NeuralSVG: An Implicit Representation for Text-to-Vector Generation
(sagipolaczek.github.io)778 points by lnyan 8 January 2025 | 72 comments
Comments
I think the utility of generating vectors is far, far greater than all the raster generation that's been a big focus thus far (DALL-E, Midjourney, etc). Those efforts have been incredibly impressive, of course, but raster outputs are so much more difficult to work with. You're forced to "upscale" or "inpaint" the rasters using subsequent generative AI calls to actually iterate towards something useful.
By contrast, generated vectors are inherently scalable and easy to edit. These outputs in particular seem to be low-complexity, with each shape composed of as few points as possible. This is a boon for "human-in-the-loop" editing experiences.
When it comes to generative visuals, creating simplified representations is much harder (and, IMO, more valuable) than creating highly intricate, messy representations.
I’d also like to see this in music generation. Tools like Suno are cool but I would much rather have something that generates MIDIs and instrument configurations instead.
Maybe this is a good lesson for generative tools. It’s possible to generate something that’s a good starting point. But what people actually want is long tail, so including the capability of precision modification is the difference between a canned demo and a powerful tool.
> Code coming soon
The examples are quite nice but I have no idea how reproducible they are.
Fun example: https://gist.github.com/scosman/701275e737331aaab6a2acf74a52...
Once you have your IR, modify and render. Once you have your render, apply a final coat of AI pixie dust.
Maybe generative models will get so powerful that fine-grained control can be achieved through natural language. But until then, this method would have the advantages of controllability, interoperability with existing tools (like Intellisense, image editors), and probably smaller, cheaper models that don’t have to accommodate high dimensional pixel space.
It’s quite amazing the progress we are seeing in AI and it will keep getting better which is somewhat terrifying.
I has to convert a bitmask to svg and was wishing to skip the intermediatary step so looked around for papers about segmentation models outputting svg and found this one https://arxiv.org/abs/2311.05276
Seriously though, this is amazing, I'm glad to see this tackled directly.
Also, I just learned from this thread that Claude is apparently usable for generating SVGs (unlike e.g. GPT-4 when I tested for it some months ago), so I'll play with that while waiting for NeuralSVG to become available.
It would be interesting to see a similar approach that incrementally works from simpler ( fewer curves ) to more complex representations.
That way one could probably apply RLHF along the trajectory too.
glad someone actually did it! great work!