Are there any local models that use this new approach to generating images?

GaggiX · 2025-04-08T11:58:30 1744113510

GPT-4o is the only model that seems to work well in the text-image joint space to this degree, even Gemini Flash 2.0 with native image support is not nearly as good so it will probably be a while for a good open source alternative to pop up (a while in the context of AI development).

gerash · 2025-04-08T22:57:55 1744153075

depends on the use case.

I used GPT-4o for some image editing (adding or removing things) to an image of a person and they distort the look of the people after each edit but (Gemini Flash + image out) did much better.

The main problem is there is little control. For example I asked to add a helicopter to an image in a ski resort but then it seems cumbersome for me to have to write a full paragraph to describe where exactly I want this helicopter to be rather than if I could just do it by dragging things with a mouse.

DeathArrow · 2025-04-08T12:12:21 1744114341

Yes, there's HiDream which yields even better results than 01.

https://github.com/HiDream-ai/HiDream-I1

GaggiX · 2025-04-08T12:24:06 1744115046

This is just a diffusion text-to-image model like many others, completely different than a LLM with a native image support.