roborovskis's comments

roborovskis · on Jan 20, 2025

Where are you seeing this? On https://github.com/deepseek-ai/DeepSeek-R1/tree/main?tab=rea... I only see the paper and related figures.

ozgune · on Jan 20, 2025

I see it in the "2. Model Summary" section (for [2]). In the next section, I see links to Hugging Face to download the DeepSeek-R1 Distill Models (for [3]).

https://github.com/deepseek-ai/DeepSeek-R1?tab=readme-ov-fil...

scribu · on Jan 20, 2025

The repo contains only the PDF, not actual runnable code for the RL training pipeline.

Publishing a high-level description of the training algorithm is good, but it doesn't count as "open-sourcing", as commonly understood.

roborovskis · on Dec 7, 2024

https://stable-baselines3.readthedocs.io/en/master/ is a great resource for hacking on implementations for RL - many good RL courses out there but https://www.youtube.com/playlist?list=PLwRJQ4m4UJjNymuBM9Rdm... is my personal favorite.

For LLMs / RLHF it's a little more difficult but https://github.com/huggingface/alignment-handbook and the Zephyr project is a good collection of model / dataset / script that is easy to follow.

I would suggest studying the basics of RL first before diving into LLM RLHF, which is much harder to learn on a single GPU.

radarsat1 · on Dec 8, 2024

Hi, the Zephyr link may be what I'm looking for. yeah I'm quite familiar with RL already so it was specifically RLHF that I was asking about, I'll check out that resource, thanks!

roborovskis · on March 14, 2024

You could definitely use this for upsampling negative prompts, though I haven't tested that much. In theory, future T2I models shouldn't need to be negatively prompted as much; I find it's better to focus on really high quality positive prompts, as that is closer to the captions the model was trained on.

You can take a look at the dataset here: https://huggingface.co/datasets/roborovski/upsampled-prompts... Roughly 5k samples were needed for the smaller ones at a minimum, filtered from the 95k total generated.

roborovskis · on March 14, 2024

Yup, the model will still forget details sometimes. This is a common issue with prompt upsampling methods, but I'm hoping to improve this with the next version.

roborovskis · on March 14, 2024

Thanks for the kind words! I started with the 780M param flan-t5-large model, and kept trying smaller and smaller base models - I was shocked at how good the output was at 77M. As you go smaller, though, it's much easier to accidentally overfit or collapse the model and produce gibberish. Had to be very careful with hyperparams and sanitizing / filtering the dataset.

roborovskis · on March 14, 2024

As Invoke is open-source and already has transformers as a dependency, it should be pretty easy to add.

smcleod · on March 15, 2024

Discussing with the maintainers on their discord now :) https://discord.com/channels/1020123559063990373/10201235598...

squigz · on March 15, 2024

I miss when such discussion happened in Github issues

smcleod · on March 15, 2024

You're preaching to the choir!

roborovskis · on March 14, 2024

I haven't tested extensively with non SDXL based checkpoints but there's nothing really SDXL specific about the model; if you're using a fine-tune that's trained on booru-style tags, it will probably not work as well - but otherwise it should work just fine. And in that case, just fork the project and tune it on however your fine-tune prompts best :)

roborovskis · on March 14, 2024

will fix these, thanks for the heads up!

roborovskis · on July 26, 2023

https://dreamstudio.ai/

esperent · on July 26, 2023

What models does dreamstudio use? I couldn't see how to view them without logging in.

dragonwriter · on July 27, 2023

Dreamstudio (and ClipDrop, also) uses Stable Diffusion, gettig new SD models generally before public release (both are owned by StabilityAI.)

roborovskis · on Aug 9, 2013

The fact that they can see what plugins I have makes uTorrent wanting a browser plugin make total sense. Just thinking.