Right, but which model? It makes a huge difference.

bugglebeetle · on Dec 27, 2023

I’ve been using OpenChat 3.5 1210 most recently. Before that, Mistral-OpenOrca. Both return JSON more consistently than gpt-3.5-turbo.

airstrike · on Dec 27, 2023

gpt-3.5-turbo is not the benchmark

bugglebeetle · on Dec 27, 2023

I don’t know what point you’re trying to make. They also return JSON more consistently than gpt-4, but I don’t use that because it’s overkill and expensive for my text extraction tasks.

WhitneyLand · on Dec 27, 2023

Because people have different interests and want to hear your results for different reasons.

Some want to consider results relative to cost, and some are interested only in how it compares to SOTA.

bugglebeetle · on Dec 27, 2023

I mean, sure, but the parent should also just explicitly state what it is they were asking or claiming. I’ve answered every question asked. Making vague declarations about something not being “the benchmark,” while not stating what you think “the benchmark” should be, is unhelpful.