… and I have absolutely no obligation to provide any particular response to any particular client.
Parsing, rendering, and trusting that the payload is consistent from request to request is your problem. You can connect to my server, or not. I really don’t care. What you cannot do is dictate how my server responds to your request.
Or, I return whatever content I want, within the bounds of the law, based on whatever parameters I decide. What's your problem with that? Again, connect to my server or don't. But don't tell me what type of response I'm obligated to provide you.
If I think a given request is from an LLM training module, I don't have any legal obligation whatsoever to return my original content. Or a 400-series response. If I want to intersperse a paragraph from Don Quixote between every second sentence, that's my call.
This argument of freedom seems applicable on both sides. A site owner/admin is free to return whatever response they wish based on the assumed origin of a request. An LLM user/service is free to send whatever info in the request that elicits a useful response.
But nobody is arguing for that. Instead, what the server owners want is to mandate the clients connecting to them to provide enough information to reliably reject such connections.
> What you cannot do is dictate how my server responds to your request.
The client is under no obligation to be truthful in its communications with a server. Spoofing a User-Agent doesn't "dictate" anything. Your server dictates how it responds all on its own when it discriminates against some User-Agents.
With enough sophistication and bad intent, at some point being untruthful to a server falls under computer intrusion laws, eg using a password that is not yours. I don't believe spoofing user agent would be determinant for any such case though.
Even redistributing secret material you found on an accidentally open S3 bucket, without spoofing UA, could be considered intrusion if it was obvious the material was intended to be secret and you acted with bad intent.
Parsing, rendering, and trusting that the payload is consistent from request to request is your problem. You can connect to my server, or not. I really don’t care. What you cannot do is dictate how my server responds to your request.