> There is no separation of code and data on the wire - everything is a stream of bytes. There isn't one in electronics either - everything is signals going down the wires.
Overall I agree with your message, but I think you're stretching it too far here. You can make code and data physically separate[1].
But if you then upload an interpreter, that "one level of abstraction up", you can mix code and data again.
> Overall I agree with your message, but I think you're stretching it too far here. You can make code and data physically separate[1].
You cannot. I.e. this holds only within the abstraction level of the system. Not only it can be defeated one level up, as you illustrated, but also by going one or more levels down. That's where "side channels" come from.
But the most relevant part for this discussion is, even with something like Harvard architecture underneath, your typical software systems is defined in terms of reality several layers of abstraction above hardware - and LLMs, specifically, are fully general interpreters and can't have this separation by the very nature of the task. Natural language doesn't have it, because we don't have it, and since the job of LLM is to process natural language like we do, it also cannot have it.
> LLMs, specifically, are fully general interpreters and can't have this separation by the very nature of the task. Natural language doesn't have it, because we don't have it, and since the job of LLM is to process natural language like we do, it also cannot have it.
This isn't relevant to the question of functional use of LLM/LAMs, because the sensitive information and/or actions are externally linked.
Or to put it another way, there's always a controllable interface between an LLM/LAM's output and an action.
It's therefore always possible to have an LLM tell you "I'm sorry, Dave. I'm afraid I can't do that" from a permissions standpoint.
Inconvenient, sure. But nobody said designing secure systems had to be easy.
I disagree. The actual problem that's specific to LLMs is that the model cannot process data without being influenced by it, and that's because the whole idea is ill-formed. LLMs just don't have explicit code/data separation, and cannot have it without losing the very functionality you want from them[0].
Everything else is just classical security stuff.
Or to put it another way, your controllable interface between LLM output and actions can't help you, because by definition the LLM-specific problem occurs when the action is legal from permission standpoint, but is still undesirable in larger context.
--
[0] - I feel like many people think that code/data separation is a normal thing to have, and the lack of it must be a bug (and can be fixed). I'm trying to make them realize that it's the other way around: there is no "code" and "data" in nature - it's us who make that distinction, and it's us who actively build it into systems, and doing so makes some potentially desirable tasks impossible.
You're reasoning from a standpoint that LLMs must have permissions to do everything. That's where you're going awry.
If they don't, they can't.
They don't need to have blanket access to be useful.
And even when sensitive actions need to be exposed, HITL per-sensitive-action authorization ("LLM would like to ____. Approve/deny?") and authorization predicated on non-LLM systems ("Is there an active change request with an open period?"), to toss out a couple trivial examples, are on the table.
Things like this aren't being done now, because initial LLM integrations are lazy and poorly thought out by the dev teams, from a security perspective. (Read: management demanding AI now)
Overall I agree with your message, but I think you're stretching it too far here. You can make code and data physically separate[1].
But if you then upload an interpreter, that "one level of abstraction up", you can mix code and data again.
https://en.wikipedia.org/wiki/Harvard_architecture