>when we get a prompt working reliably on one model, we often have trouble porting it to another LLM
I saw a study where a prompt massively boosted one model's performance on a task, but significantly reduced another popular model's performance on the same task.
I saw a study where a prompt massively boosted one model's performance on a task, but significantly reduced another popular model's performance on the same task.