I gave Sonnet 4.5 a base64 encoded PHP serialize() json of an object dump and to...

arghwhat · 2025-11-24T19:25:44 1764012344

If you're asking an LLM to compute something "off the top of its head", you're using it wrong. Ask it to write the code to perform the computation and it'll do better.

Same with asking a person to solve something in their head vs. giving them an editor and a random python interpreter, or whatever it is normal people use to solve problems.

serf · 2025-11-24T20:03:21 1764014601

the decent models will (mostly) decide when they need to write code for problem solving themselves.

either way a reply with a bogus answer is the fault of the provider and model, not the question-asker -- if we all need to carry lexicons around to remember how to ask the black box a question we may as well just learn a programming language outright.

arghwhat · 2025-11-25T10:22:09 1764066129

I disagree, the answer you get is dictated by the question you ask. Ask stupid, get stupid. Present the problem better, get a better answer. These tools are trained to be highly compliant, so you get what you ask.

Same happens with regular people - a smart person doing something stupid because they weren't overly critical and judgingof your request - and these tools have much more limited thinking/reasoning than a normal person would have, even if they seem to have a lot more "knowledge".

chinathrow · 2025-11-24T20:38:45 1764016725

Yes, Sonnet 4.5 tried like 10min until it had it. Way too long though.

int_19h · 2025-11-24T22:06:34 1764021994

base64 specifically is something that the original GPT-4.0 could decode reliably all by itself.

arghwhat · 2025-11-25T10:26:33 1764066393

I could also decode it by hand, but doing so is stupid and will be unreliable. Same with an LLM - the network is not geared for precision.

int_19h · 2025-11-25T16:59:14 1764089954

You don't know what it's geared for until you try. Like I said, GPT-4 could consistently encode and decode even fairly long base64 sequences. I remember once asking it for an SVG image, and it responded with HTML that had an <img> tag in it with a data URL embedding the image - and it worked exactly as it should.

You can argue whether that is a meaningful use of model capacity, and sure, I agree that this is exactly the kind of stuff tool use is for. But nevertheless the bar was set.

arghwhat · 2025-11-26T10:01:49 1764151309

Sure you do, the architecture is known. An LLM will never be appropriate to use for exact input transforms and will never be able to guarantee accurate results - the input pipeline yields abstract ideas as text embedding vectors, not a stream of bytes - but just like a human it might have the skill to limp through the task with some accuracy.

While your base64 attempts likely went well, that it "could consistently encode and decode even fairly long base64 sequences" is just an anecdoate. I had the same model freak out in an empty chat, transcribing the word "hi" to a full YouTube "remember to like and subscribe" epilogue - precision and determinism are the parameters you give up when making such a thing.

(It is around this time that the models learnt to use tools autonomously in a response, such as running small code snippets which would solve the problem perfectly well, but even now it is much more consistent to tell it to do that, and for very long outputs the likelihood that it'll be able to recite the result correctly drops.)

hu3 · 2025-11-24T19:33:23 1764012803

> I gave Sonnet 4.5 a base64 encoded PHP serialize() json of an object dump and told him to extraxt the URL within.

This is what I imagine the LLM usage of people who tell me AI isn't helpful.

It's like telling me airplanes aren't useful because you can't use them in McDonald's drive-through.

astrojams · 2025-11-24T22:04:16 1764021856

I find it hilarious that it rick rolled you. I wonder if that is an easter egg of some sort?

mikestorrent · 2025-11-24T19:24:48 1764012288

You should probably tell AI to write you programs to do tasks that programs are better at than minds.

stavros · 2025-11-24T19:27:10 1764012430

Don't use LLMs for a task a human can't do, they won't do it well.

wmf · 2025-11-24T19:44:40 1764013480

A human could easily come up with a base64 -d | jq oneliner.

stavros · 2025-11-24T19:59:32 1764014372

So can the LLM, but that wasn't the task.

wmf · 2025-11-24T21:12:33 1764018753

I'm surprised AIs don't automatically decide when to use code. Maybe next year.

stavros · 2025-11-24T21:13:59 1764018839

They do, it just depends on the tool you're using and the instruction you give it. Claude Code usually does.

idonotknowwhy · 2025-11-24T22:37:52 1764023872

Almost any modern LLM can do this, even GPT-OSS

gregable · 2025-11-24T19:37:43 1764013063

it. Not him.

mceachen · 2025-11-24T20:19:32 1764015572

You can ask it. Each model responds slightly differently to "What pronouns do you prefer for yourself?"

Opus 4.5:

I don’t have strong preferences about pronouns for myself. People use “it,” “they,” or sometimes “he” or “she” when referring to me, and I’m comfortable with any of these.

If I had to express a slight preference, “it” or “they” feel most natural since I’m an AI rather than a person with a gender identity. But honestly, I’m happy with whatever feels most comfortable to you in conversation.

Haiku 4.5:

I don’t have a strong preference for pronouns since I’m an AI without a gender identity or personal identity the way humans have. People typically use “it” when referring to me, which is perfectly fine. Some people use “they” as well, and that works too.

Feel free to use whatever feels natural to you in our conversation. I’m not going to be bothered either way.

chinathrow · 2025-11-24T19:56:33 1764014193

It's Claude. Where I live, that is a male name.