I see it in the "2. Model Summary" section (for [2]). In the next section, I see links to Hugging Face to download the DeepSeek-R1 Distill Models (for [3]).
Hi, the Zephyr link may be what I'm looking for. yeah I'm quite familiar with RL already so it was specifically RLHF that I was asking about, I'll check out that resource, thanks!
You could definitely use this for upsampling negative prompts, though I haven't tested that much. In theory, future T2I models shouldn't need to be negatively prompted as much; I find it's better to focus on really high quality positive prompts, as that is closer to the captions the model was trained on.
Yup, the model will still forget details sometimes. This is a common issue with prompt upsampling methods, but I'm hoping to improve this with the next version.
Thanks for the kind words! I started with the 780M param flan-t5-large model, and kept trying smaller and smaller base models - I was shocked at how good the output was at 77M. As you go smaller, though, it's much easier to accidentally overfit or collapse the model and produce gibberish. Had to be very careful with hyperparams and sanitizing / filtering the dataset.
I haven't tested extensively with non SDXL based checkpoints but there's nothing really SDXL specific about the model; if you're using a fine-tune that's trained on booru-style tags, it will probably not work as well - but otherwise it should work just fine. And in that case, just fork the project and tune it on however your fine-tune prompts best :)