I wish this was more broadly, explained to people…
There are LLMs, the engines that make these products run, and then the products themselves.
GPT anything should not be asked math problems. LLMs are language models, not math.
The line is going to get very blurry because ChatGPT, or Claude or Gemini, are not LLM’s. Their products driven by LLMs.
The question or requisite should not be can my LLM do math. It can I build a product that is LLM driven that can reason through math problems. Those are different things.
A coworker of mine told me that GPT’s LLM can use Excel files. No, it can’t. But the tools they plugged into it can.
> A coworker of mine told me that GPT’s LLM can use Excel files. No, it can’t. But the tools they plugged into it can.
It's a bit like saying that a human can't use Excel files, but when given a keyboard, mouse and monitor connected to a computer running Excel, it can. But then obviously the "Excel usage" competency is in the human; not in the tools, and a cat for example cannot use Excel proficiently however many training hours it gets and however good the keyboard is.
Taking it back to the LLMs, it is clear to me that some modern LLMs like the one running ChatGPT can be integrated with tools in a way that makes them somewhat proficient with Excel, while other simpler LLMs cannot, regardless of the tools.
> A coworker of mine told me that GPT’s LLM can use Excel files. No, it can’t. But the tools they plugged into it can.
And there's a 50/50 chance they'll use the right tool for the job. I tried the math question above multiple times on gpt5 and it gets it right about 50% of the time. If i ask to "try again" it usually gets it on the 2nd or 3rd try. Most times that it's wrong, it's not far off but it looks deceptively accurate at first glance.
I think the argument was more about that the ability of LLMs to multiply without tool use improves over the generations, so it does happen to be yet another test showing improved abilities.