Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Personally, I'm excited to see ML researchers doing cool stuff with small models again!

With LLMs taking over the spotlight it's easy for people to forget that not everything needs billions or trillions of parameters. Stable Diffusion fits comfortably in my 8GB of VRAM and can generate amazing images. I'd love to see more research like this in smaller models that can be used on cheap consumer hardware.



We want much larger models, not because they're "cool," but because they exhibit capabilities that tiny models don't exhibit, including the ability to perform new tasks for which they were not trained, without requiring finetuning.


I question the assumption that fine-tuning should always be avoided.

If a model is going to be used many times for a specific use case, it is far cheaper and uses far less energy to fine tune a small model once and run it on cheap low-power hardware than it is to continuously run a huge, do-everything model on expensive, high-power hardware. Enormous models are great for exploration and for general purpose applications like ChatGPT, but I think that we will find over the next few years that smaller, purpose-built models will continue to dominate in applications like geospatial analysis.


We're talking about different things. You're talking about finetuning models to tasks known in advance. I'm talking about the ability to generalize to new tasks: https://arxiv.org/pdf/2206.07682. Please don't argue against a straw-man.


I'm not attacking a straw man, it seems that I just don't understand the distinction you're drawing.

As I understand it we're contrasting two opposite approaches to ML: fine tuning small models for specific applications versus training a single large model that can generalize to new tasks without preparing them ahead of time.

I'm arguing that in fine tuning is far more useful than people are currently giving it credit for, and that generalizing a single massive model to new tasks is overrated.

Can you clarify where you're seeing a straw man?


You're talking about tasks known in advance; I'm not.


And I'm saying that any task that wasn't known in advance immediately becomes a task known in advance once it's done once.

I'm not arguing that there is no place for large, general models—they're great for exploration—just that a smaller foundation model shouldn't be dismissed offhand based solely on parameter count.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: