Personally, I'm excited to see ML researchers doing cool stuff with small models...

cs702 · on Aug 3, 2023

We want much larger models, not because they're "cool," but because they exhibit capabilities that tiny models don't exhibit, including the ability to perform new tasks for which they were not trained, without requiring finetuning.

lolinder · on Aug 3, 2023

I question the assumption that fine-tuning should always be avoided.

If a model is going to be used many times for a specific use case, it is far cheaper and uses far less energy to fine tune a small model once and run it on cheap low-power hardware than it is to continuously run a huge, do-everything model on expensive, high-power hardware. Enormous models are great for exploration and for general purpose applications like ChatGPT, but I think that we will find over the next few years that smaller, purpose-built models will continue to dominate in applications like geospatial analysis.

cs702 · on Aug 3, 2023

We're talking about different things. You're talking about finetuning models to tasks known in advance. I'm talking about the ability to generalize to new tasks: https://arxiv.org/pdf/2206.07682. Please don't argue against a straw-man.

lolinder · on Aug 3, 2023

I'm not attacking a straw man, it seems that I just don't understand the distinction you're drawing.

As I understand it we're contrasting two opposite approaches to ML: fine tuning small models for specific applications versus training a single large model that can generalize to new tasks without preparing them ahead of time.

I'm arguing that in fine tuning is far more useful than people are currently giving it credit for, and that generalizing a single massive model to new tasks is overrated.

Can you clarify where you're seeing a straw man?

cs702 · on Aug 3, 2023

You're talking about tasks known in advance; I'm not.

lolinder · on Aug 3, 2023

And I'm saying that any task that wasn't known in advance immediately becomes a task known in advance once it's done once.

I'm not arguing that there is no place for large, general models—they're great for exploration—just that a smaller foundation model shouldn't be dismissed offhand based solely on parameter count.