The only way out seems to be using obscene captcha.

teeray · on June 15, 2024

Or detect the LLM and serve up an LLM rewritten version of the page. That way you feed it poisonous garbage.

IAmGraydon · on June 15, 2024

I really like this idea. Someone needs to implement this. I'm not sure what the ideal poison would be. Randomly constructed sentences that follow the basic rules of grammar?

LegitShady · on June 15, 2024

>I'm not sure what the ideal poison would be

ChatGPT, write a short story that warns about the dangers of artificial intelligence stealing people's intellectual property, from the perspective of a hamster in a cage beside a computer monitor.

egberts1 · on June 15, 2024

That's easy.

Mix up the verbs, add/delete "not", "but", "and".

Change names.

mistrial9 · on June 15, 2024

fun! but a few ill-intentioned agitators can use up the ability and resources of those trying to fight back. This phenomenon is well-known in legal circles I believe..

aspenmayer · on June 15, 2024

> This phenomenon is well-known in legal circles I believe..

I think you’re referring to spoliation, but in this context it could be considered a special-case of a document dump.

https://en.wikipedia.org/wiki/Tampering_with_evidence#Spolia...

https://en.wikipedia.org/wiki/Document_dump

ai4ever · on June 15, 2024

make this open-source, I can bet you will see a lot of contributors.

then, make it easy for content producers to incorporate into their websites.

janalsncm · on June 15, 2024

The issue is detecting them when they use random user agents and ip ranges.

bakugo · on June 15, 2024

> when they use random user agents and ip ranges

From what I've seen, most AI scrapers operate on known cloud IP ranges, usually amazon (Perplexity included), so just check for those.

jsheard · on June 15, 2024

I assume that's why Reddit appears to be cracking down on VPNs lately, they probably don't actually care about VPNs but they're throttling scraper traffic coming from datacenter IP address ranges, which VPN providers are also using.