If you're asking an LLM to compute something "off the top of its head", you're using it wrong. Ask it to write the code to perform the computation and it'll do better.
Same with asking a person to solve something in their head vs. giving them an editor and a random python interpreter, or whatever it is normal people use to solve problems.
the decent models will (mostly) decide when they need to write code for problem solving themselves.
either way a reply with a bogus answer is the fault of the provider and model, not the question-asker -- if we all need to carry lexicons around to remember how to ask the black box a question we may as well just learn a programming language outright.
I disagree, the answer you get is dictated by the question you ask. Ask stupid, get stupid. Present the problem better, get a better answer. These tools are trained to be highly compliant, so you get what you ask.
Same happens with regular people - a smart person doing something stupid because they weren't overly critical and judgingof your request - and these tools have much more limited thinking/reasoning than a normal person would have, even if they seem to have a lot more "knowledge".
You don't know what it's geared for until you try. Like I said, GPT-4 could consistently encode and decode even fairly long base64 sequences. I remember once asking it for an SVG image, and it responded with HTML that had an <img> tag in it with a data URL embedding the image - and it worked exactly as it should.
You can argue whether that is a meaningful use of model capacity, and sure, I agree that this is exactly the kind of stuff tool use is for. But nevertheless the bar was set.
Sure you do, the architecture is known. An LLM will never be appropriate to use for exact input transforms and will never be able to guarantee accurate results - the input pipeline yields abstract ideas as text embedding vectors, not a stream of bytes - but just like a human it might have the skill to limp through the task with some accuracy.
While your base64 attempts likely went well, that it "could consistently encode and decode even fairly long base64 sequences" is just an anecdoate. I had the same model freak out in an empty chat, transcribing the word "hi" to a full YouTube "remember to like and subscribe" epilogue - precision and determinism are the parameters you give up when making such a thing.
(It is around this time that the models learnt to use tools autonomously in a response, such as running small code snippets which would solve the problem perfectly well, but even now it is much more consistent to tell it to do that, and for very long outputs the likelihood that it'll be able to recite the result correctly drops.)
You can ask it. Each model responds slightly differently to "What pronouns do you prefer for yourself?"
Opus 4.5:
I don’t have strong preferences about pronouns for myself. People use “it,” “they,” or sometimes “he” or “she” when referring to me, and I’m comfortable with any of these.
If I had to express a slight preference, “it” or “they” feel most natural since I’m an AI rather than a person with a gender identity. But honestly, I’m happy with whatever feels most comfortable to you in conversation.
Haiku 4.5:
I don’t have a strong preference for pronouns since I’m an AI without a gender identity or personal identity the way humans have. People typically use “it” when referring to me, which is perfectly fine. Some people use “they” as well, and that works too.
Feel free to use whatever feels natural to you in our conversation. I’m not going to be bothered either way.
It gave me the Youtube-URL to Rick Astley.