> And so perhaps the ethics of web scraping are not so straightforward.
It strikes me that the _ethics_ of web scraping are extremely straightforward and cognizable with a terse analysis:
* You can respond however you like to my HTTP request, and I can parse your response however I like.
Simple, traditional, common. This is the way that conversations have occurred since the dawn of human communication, no?
> the legal issues associated with it.
But aren't these, without exception, fabrics spun out of the cloth that shields established players with the threat of state violence? This is not particularly new, and seems to fit in the pathetic-and-predictable file.
Moreover, the broader cheap attempt to cast this in "intellectual" property terms, and to attach that to protection of artists and creators, warrants a very particular eye-roll for its illogic.
Do you apply this ethics to webs scraping only, or to all other network communications too?
Because if that's your general principles, you are making the internet much shittier. I still remember the old internet with open SMTP servers, easy-to-use comment forms, and forums which did not require emails and capthas. But people with "You can respond however you like to my HTTP request" attitude ruined it with spam, scam and SEO.
If you only apply this to web scraping, then where do you draw the line and why? Can you scrape at maximum rate server can support? Can you scrape if this requires active action (like account creation?) As long as you scrape, can you also post some links to improve your SEO?
> But people with "You can respond however you like to my HTTP request" attitude ruined it with spam, scam and SEO.
I don’t see how those things relate. They all have separate ethical issues. You can believe it’s ok to scrape whatever info you can find online at the same time as believing it’s not ok to scam people.
> Do you apply this ethics to webs scraping only, or to all other network communications too?
I mean... if you're keying in at 20MHz and blasting a gigawatt of noise, then yeah you've certainly run afoul of decency and just law. You're changing the physical shape of the network environment.
But if the concern is just that we don't like the bytes to which your signal decodes, or we don't like what you're doing with the response we give you, then it seems more like a speech/press issue.
The internet needs to grow resilience such that annoyances in the logical layers are easy to ignore if you have the will. But that almost certainly means that you don't get to police what people do with the content you willingly hand over, pursuant to the protocol in use.
If I say, “Hey, please don’t text me anymore. I’m going to block this number,” and you respond by buying 500 phones in five cities and text me nonstop, is that ethical?
Not sure the metaphor works here. For example most sites let Google scrape them as much as it likes, but go out of their way to block other robots. By doing so they are effectively forcing the whole world to use (or support, since smaller search engines have to piggyback on the big ones wih special status, and pay them) proprietary spyware.
In your analogy, most websites block everyone except the biggest pervert known to man.
yes, people like OP who get the farms of scrapers.
The website owners make their preferences clear with robots.txt, IP blocks and other antibot technology. Scrapers intentionally ignore owners' desires and force the to respond.
It's your job to separate the wheat from the chaff at the boundary of your network interface. In fact, personal boundaries of all sorts, from informational to emotional to physical to economic, are of paramount importance in the information age.
Nobody (and certainly not the state) is going to erect your personal boundaries for you by ensuring justice in the face of spammy text messages (or, for that matter, hypnotic and manipulative social media). This is your job - maybe your most important job.
Just as its your job to protect your personal health and safety. Nobody (and certainly not the state) is going to do that for you.
Is there something about the trajectory of evolution of the internet that suggests to you that this is incorrect?
I observe continually (seemingly perpetually) increasing traffic, and continually (seemingly perpetually) increasing capacity for general purpose computing. I also observe enormous empathy and cyberpunk traditions in our communities, protecting each other. Do my eyes and ears deceive me?
Restraining orders are a thing for a reason. It's cheaper to harass someone out of business (intentionally or otherwise) than to compete on a level playing field.
Being a good neighbor requires restraining oneself and making requests with consideration for the other party.
Full disclosure: I worked for a price monitoring service that prided itself on crawling up to every 3 hours. Steps were always taken to mitigate the impact. Sometimes even asking hosts to allow-list the crawlers.
Sure, but for the purposes of this conversation, saying "for a reason" regarding a function which is presently delegated to the state is fraught with all sorts of future-proofing concerns.
It seems to me that, as a baseline, we have to agree to observe the apparent trend of the internet to supplant the state - to resist its censorship and influence almost entirely - as an indicator that our long-term thinking needs to put those relatively few state functions which are essential to a peaceful society (such as restraining orders) in the purview of the internet... somehow. Maybe that will prove to be unnecessary, but in the case that the state fades, we'll be happy we had the foresight.
Internet traffic is barely (and arguably, already not) under human control as it is. And in another century, it will almost certainly be impossible to tell the machines 'enhance your calm or else'. Or else what?
I agree wholeheartedly about your qualities of good neighbor roles. But I don't think they extrapolate the way you think they do.
Consider this: at every moment, your house - your literal dwelling - is bombarded with high-level, semantic radio traffic, from way down where the messages bounce off the ionosphere all the way up to 10GHz and beyond. But this doesn't bother you. You ignore what you don't need! You draw boundaries and personally work on strengthening them - with the help of your friends and neighbors.
The internet needs help taking this shape at the application layer (and really, at all layers). And that part is up to us. We can't just throw our hands up and say "<legacy state function> exists for some reason, doesn't it?"
The government is our tool for regulating society when self regulation fails. It may be a blunt instrument and a last resort. Yet there is a place for it. We cannot entirely outsource all boundaries to individuals and private institutions.
I agree it would be ideal if the Internet could be as opt-in and benign as you suggest. Though I'm not even sure such an architecture is possible. How do you drive down the cost of listening and filtering to near zero whilst still allowing the desired signal?
And even if it were possible, consider that we do rely on governments to regulate the limited radio spectrum that we all have to share. Otherwise it wouldn't be an option to opt in to. The signal would be drown out by whomever has the strongest transmitters.
> The government is our tool for regulating society when self regulation fails. It may be a blunt instrument and a last resort. Yet there is a place for it. We cannot entirely outsource all boundaries to individuals and private institutions.
I don't know who "our" refers to here, but if humans are evolving into "the internet", or however you want to think of this creature which is emerging over the course of this century (and appears wont to accelerate over the next few centuries), then I don't think the state is "ours". We can't just cover our eyes when presented with the proclivity of the internet not to tolerate the state.
> I agree it would be ideal if the Internet could be as opt-in and benign as you suggest. Though I'm not even sure such an architecture is possible. How do you drive down the cost of listening and filtering to near zero whilst still allowing the desired signal?
Cryptography.
> And even if it were possible, consider that we do rely on governments to regulate the limited radio spectrum that we all have to share. Otherwise it wouldn't be an option to opt in to. The signal would be drown out by whomever has the strongest transmitters.
...really? Do you really believe that the state is a force for coordination and openness in radio?
The only bands which reliably continue to have these characteristics are the amateur bands, which have been defended by users for decades against constant encroachment by a state which, if it had its druthers, would've sold these bands to AT&T a long time ago.
My sense is that, if the government thought we weren't watching, they'd simply cancel the amateur radio license program. It is people standing to be counted (by taking the test) that keeps these bands viable _despite_ the FCC, not the other way around.
It strikes me that the _ethics_ of web scraping are extremely straightforward and cognizable with a terse analysis:
* You can respond however you like to my HTTP request, and I can parse your response however I like.
Simple, traditional, common. This is the way that conversations have occurred since the dawn of human communication, no?
> the legal issues associated with it.
But aren't these, without exception, fabrics spun out of the cloth that shields established players with the threat of state violence? This is not particularly new, and seems to fit in the pathetic-and-predictable file.
Moreover, the broader cheap attempt to cast this in "intellectual" property terms, and to attach that to protection of artists and creators, warrants a very particular eye-roll for its illogic.