Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The only way out seems to be using obscene captcha.


Or detect the LLM and serve up an LLM rewritten version of the page. That way you feed it poisonous garbage.


I really like this idea. Someone needs to implement this. I'm not sure what the ideal poison would be. Randomly constructed sentences that follow the basic rules of grammar?


>I'm not sure what the ideal poison would be

ChatGPT, write a short story that warns about the dangers of artificial intelligence stealing people's intellectual property, from the perspective of a hamster in a cage beside a computer monitor.


That's easy.

Mix up the verbs, add/delete "not", "but", "and".

Change names.


fun! but a few ill-intentioned agitators can use up the ability and resources of those trying to fight back. This phenomenon is well-known in legal circles I believe..


> This phenomenon is well-known in legal circles I believe..

I think you’re referring to spoliation, but in this context it could be considered a special-case of a document dump.

https://en.wikipedia.org/wiki/Tampering_with_evidence#Spolia...

https://en.wikipedia.org/wiki/Document_dump


make this open-source, I can bet you will see a lot of contributors.

then, make it easy for content producers to incorporate into their websites.


The issue is detecting them when they use random user agents and ip ranges.


> when they use random user agents and ip ranges

From what I've seen, most AI scrapers operate on known cloud IP ranges, usually amazon (Perplexity included), so just check for those.


I assume that's why Reddit appears to be cracking down on VPNs lately, they probably don't actually care about VPNs but they're throttling scraper traffic coming from datacenter IP address ranges, which VPN providers are also using.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: