I'm also seeing teams who expected big gains from fine tuning get incremental or moderate gains. Then they put it in production and regret the action as SOTA marches quickly.
I have avoided fine tuning because the models are currently improving at a rate that exceeds big corporate product development velocity.
Absolutely the first thing you should try is a prompt optimizer. The GEPA optimizer (implemented in DSPy) often outperforms GRPO training[1]. But I think people are usually building with frameworks that aren't machine learning frameworks.
I have avoided fine tuning because the models are currently improving at a rate that exceeds big corporate product development velocity.