Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"If we can collect behavioural data from you, and it matches closely the behaviour of other humans, you are a human. Otherwise you're not a human."

Does anyone else get that feeling from the description of what Google is doing? I've tripped their "we think you are a bot" detection filter and been presented with a captcha countless times while using complex search queries and searching for relatively obscure things, and it's frankly very insulting and rather disturbing that they think someone who inputs "unusual" search queries, according to their measure, is not human. I have JS and cookies disabled so they definitely cannot track my mouse movements and I can't use this way of verifying "humanness", but what if they get rid of the regular captchas completely (based on the argument that eventually the only ones who will make use of and solve them are bots)? Then they'll basically be saying "you are a human only if you have a browser that supports these features, behave in this way, and act like every other human who does." The fact that Google is attempting to define and thus strongly normalise what is human behaviour is definitely a big red flag to me.

(Or maybe I'm really a bot, just an extremely intelligent one. :-)



> and it's frankly very insulting and rather disturbing that they think someone who inputs "unusual" search queries, according to their measure, is not human.

Insulting? how is that insulting? You are entitled. You are entitled you block scripts, use Google's FREE service to perform any search query to search the web while blocking any program that attempts to identify you as not a bot.

But if while using their free service they cannot identify you as a human given the factors they measure(becuase you actively disabled the programs that measure such factors), then I see nothing wrong in them trying alternative ways (which were a standard before).

I think you are making a storm in a teacup. If you feel offended by the way their website work, just don't use them. I don't see any red flags at all.


You are entitled.

Has calling someone entitled ever been useful? For the last few years it's felt like nothing more than a petty "you're wrong" remark with bonus condensation built right in.


Been useful at changing their behaviour/beliefs? No, for the same psychological reasons any direct contradiction isn't.

Been useful at communicating that they feel someone is confusing expectations with rights? Yes, that's why there is a word for it.

Also, I get it's annoying to hear entitled, ad hominem, logical fallacy, privilege, and other currently trendy words being over used and often misused. But I'll take it over what there was before that, which was no discussion at all at that level of abstraction most of the time. The places where these words are overused are places where the community is learning these concepts.


> Has calling someone entitled ever been useful? For the last few years it's felt like nothing more than a petty "you're wrong" remark with bonus condensation built right in.

In this case, calling someone entitled is actually a compliment, not an ad hominem insult or put-down, because it acknowledges the poster's humanity.

The proper response in this case would be, "Why, yes, I probably am a bit entitled, like most people. Thank you for recognizing that I am human."


Do you mean condescension? Not trying to be a smart ass, I just got a chuckle out of the word choice.


There you go, being condensing again.


You need to chill out!


Is this Reddit? What year is it?


>Has calling someone entitled ever been useful?

Not useful, and more over, it's ad hominem.


It's only ad hominem if it's used as support for an argument not directly related to the other's personality. It's not ad hominem in this case, because "you're entitled" is being used to support the claim, or as a parallel claim, that it's unreasonable to be insulted by a computer program thinking you're another computer program.

Claim: Turing was wrong about the Church-Turing Thesis because he was a homosexual -> ad hominem

Claim: Turing was immoral because he was a homosexual -> not ad hominem (although still not a good argument)


Saying "just don't use Google/Bing/search engines" is like saying "don't use the airlines".

The security checks suck big time, but really, these services are must-have and there is no better alternative.

Complaining and hoping for some change is all that's left.



Sorry, as much as I tried to like it, the results just don't match what I need...


Because they don't make any money, because only the sorts of shitty practises people often complain about are what enables making money providing free services.

If enough people were actually willing to pay to use a search engine you could have an awesome search engine with none of that.


They insert their own referrals for various sites, they do make money.


https://startpage.com - they offer anonymized google results & don't log IP.


DDG is excellent most of the time, but way behind when doing video or map searches. I have it as my default in most places, but sometimes I just have to fall back to Google.


Agreed. Additionally, DDG is useless when searching for current news events. Google is great at flowing in headlines for recent events with searches, which often leads me to include the "!g" bang if searching for a recent issue/store.


I don't disagree, but it did trump the point of the comment it replied to.

DDG will get the job done.


And DDG makes it super easy to search google with a bang:

"!g hacker news"


Yup. The bang feature makes DDG amazing. I use it as my default because it's a great way to quickly do tailored searches. It's very easy to say, search Hacker News (!hn), python documentation (!py3), R sites (!rseek) or just pull up a wikipedia page (!w).


You can use Disconnect Search that make your google searches anonymous: https://search.disconnect.me/


I wish that were true. I tried to switch to them after they did their 'next' a few months back but the results were most of the time not nearly as good as Google's for my queries and I had to switch back.


Part of the reason google is so good is that they track you.. if you disable traffic, they aren't quite as good.


Google's services are not free.

You trade your data, screen real estate, and attention for their service. This is worth a lot - Google is worth a lot. They didn't do it by giving out services for free.


Google's services are free. By trying to redefine what free actually means into 'has no cost whatsoever to anyone' you ruin the word.


If you trade a fish for a service it isn't free. If you trade a gold bar for a service it isn't free. If you trade a service for a service it isn't free.

If you trade data for a service it is not free.

To consider "free" a function only of fiat currency is naive, both of history and of economics.

Google search is not free.

If it is I have no idea how they made so much money...

Or maybe you can tell me what 'free actually means'?


By that broad definition, there is practically no free website on the web (analytics, logs, etc.). Actually, by posting this on HN, I just "traded data, screen real estate and attention". Would you argue HN isn't free as well? I get what you mean but I don't think this is how the word free is commonly used.

Besides, I believe the original point made still makes sense if "free" is assumed to mean "non paid".


A note that screen real estate and attention here pertains mostly to paid impressions - be they advertisements or politicizing messages. When it comes to content sought by the user, it's hard to say that the user is giving the service provider real estate and attention. It is only when the service provider is showing content not for the benefit of the consumer but for their own self that the attention and real estate can be thought of as 'rented' to them.

I would agree with the assertion that there are practically no free websites on the web. Since when did we convince ourselves we can get things for free?

There are major exceptions. Wikipedia is for the most part free. It does not advertise to you, nor does it siphon and sell your data. It does not track you around the web, it does not sell your Wikipedia viewing behavior to urchin for cents. It is driven by donations.

HN also appears legitimately free to me. As far as I know YCombinator does not mine or sell your data or collect data other that what is required for the forum to be a forum. YCombinator makes its money by other means. It certainly benefits by cultivating a technical online community, which is why I think it does it - though what influence YC can/does project on the community could be thought of as a social cost (I know very little to nil about whether or how much this is done).

Google, however, is not one of these cases. Nor is most of the web.

I'm not sure if the original point still makes sense with 'non paid' (nor am I sure 'non paid' is right). The original point uses 'free' (in caps) to emphasize a sense of charity they use to inform their 'entitled' argument. First, their argument is essentially 'What you expect this to be free? You are entitled!' Second, I'm not sure that replacing the term will work, unless it also communicates charity.

The point here is that the exchange does not constitute charity. Google thinks the trade is a very good deal. Presumably internet surfers do too. But there is an exchange and that needs to be recognized.

Anyway this means that any term that communicates 'charity' will be ignorant of the conditions of how Google's service works - and I would have posted the same misgivings.


Google Search is free to search. It's not free to advertise on. Searchers are not Google's customer.


In this sense the searcher, her data, her screen real estate and her attention are the product Google offers to advertisers.

These are the things the searcher trades for the service.


If a fisherman gives you a fish in exchange for writing your name and time of your visit down in his logbook, I consider the fish to be free for all intentions and purposes.


I would agree with that.

I would also posit that Google looks and does nothing like that fisherman.


I know what you are saying, but actually this service is free. Blocking bots effectively is in both the website owner and Google's interests, because bots disrupt the value propositions of both. And you could argue that it is in the website reader's interests too, by extension.

Given their scale and resources, Google are able to provide a far more effective bot detector than any of us could do on our own. I for one am delighted they are providing this very valuable service.


Not sure what blocking bots have to do with the freeness of the service. Perhaps you'd like to reply on one of the comments further down to get into why you believe the service is free?

You may argue that the trade is in the website reader's best interest. This is a different argument than whether it is free.


My real estate and attention are given to them because I came to their service asking to fill my screen according to my query.

I can agree Google is not providing a free pure-search-results service, but they do provide a free search results + ads service. Whether getting relevant results + [relevant] ads is a worth anything to you - even $0 - is a separate question, but it's a stretch to frame as an exchange. It's like taking a free hot dog and complaining it's not free because you traded your time & taste buds eating the bun while you only wanted the sausage... [I'd buy it more for e.g. youtube pre-video ads, where you are forced to give attention and time to the ad first.]

Now my data is a better point. Very valid for the totality of google services; quite weak for logged-out search use. If you work answering questions, and recording the questions that get asked and where they came from, then yes I did hand you this data but it's almost inherent in asking the question.

[Disclaimer: I'm a xoogler. And all this is nit-picking.]


free as in beer: Free in the sense of costing no money; gratis.[1]

[1] http://en.wiktionary.org/wiki/free_as_in_beer


I don't understand how this clarifies the term free? Free as in beer is used to specify the freeness of a product or service, rather than the freeness of 'freedom', say from authoritarianism.

This conversation is about the meaning of 'money' modulo this understanding of free-as-in-beer - i.e. whether non-fiat scarce resources (user data/screen real estate) count as money.


free has 20+ definitions.[1] A discussion about what we mean when we say Google's services are (or are not) free is using free the same way we use it in free as in beer [thus its definition is relevant].

Colloquially we usually use free to mean not having a financial cost. Another word or phrase is usually used when referring to non-monetary costs. i.e. I would say "Google is free" but I would never say "Google costs nothing."

[1] http://en.wiktionary.org/wiki/free


The top sentence is granted - not sure it was ever in question.

The bottom part you use personal anecdotes to support the claim that a broader 'we' do something. I'm not sure, as my personal experience differs. But it does get to exactly what I was saying in the above comment - what the discussion centers about is what counts as 'money' (as you say "referring to non-monetary costs").

I think the place we differ is whether non-fiat scarce resources count as money. I think they do. Historically they have. In economics literature and practice they do.

Or perhaps the reservation is that the scarce resources in this instance are 'soft' resources like attention, screen real estate and personal data? Much of what is traded by financial institutions (for example) today are very virtual - trades of risks, credits (promises), futures, bets. Even real estate is traded on the idea that it occupies space of human attention and investment - not necessarily because it can be used as a means to 'produce' something. I'm hesitant to draw firm lines between these soft assets - I'm not sure where I could sensibly draw them.

Either way, I'm glad we agree that Google costs something. I do think that the OP intended their use of free (in capitals and context) to mean "Google costs nothing."


Perhaps the downvoter would be kind enough to clarify why they think these comments do not contribute to the conversation.


I didn't downvote but you may want to read HN's Guidelines[1], particularly: Resist complaining about being downmodded. It never does any good, and it makes boring reading.

[1] https://news.ycombinator.com/newsguidelines.html


The challenge (no complaint here, though I do believe it was down(modded?) merely because of disagreement and not for relevance or quality) was meant to incite more on topic discussion.

It's interesting I've never read the guidelines before now. Was refreshing to have taken a look, although it's mostly common sense and etiquette.


They are if you block ads, scripts, and cookies.


You're generalising. This argument only makes sense if Google's entire ecosystem of services was just like any other random, independent selection of sites on the Internet.

There is no equivalent to Google. Nobody else is doing this, particularly not to this extent. Not using all that computing power and AI to do it.

Yes, if Google thinks I'm a robot, I don't think it's so strange to consider that some sort of value judgement, even if it's done by a legion of machines. Definitely more so than if some random small-time website decides to make that call based on a couple of if-then statements.

Imagine if using a web service is like visiting a shop, and you get directed to the slow-checkout+ID-check lane because maybe you stammered your order, or because you know the store that well, your shopping-cart route through the aisles is deemed "too fast" (read: efficient, also avoiding the "special offers", cookies/candy/soda/junk aisles).

Amusingly, how I feel about that "judgement", varies. Sometimes it's annoying sometimes it's cool because I feel "hey I'm doing something clever that humans usually don't". Similar to how being ID-checked in a liquor store can be both annoying and flattering (depending on your age and how often it happens).


You'd have a point except for the fact that recaptchas have become increasingly impossible to solve (for humans!). And recaptchas aren't just on google sites, they're everywhere.


Which is what this is trying to help solve. They know they're getting harder, so they're trying to identify you before even hitting the captcha part so that you don't have to do it.


No one should complain about anything, ever.


Actually it would be great is someone has some ideas for ways to identify humans that don't require stuff like javascript. From the perspective of a service provider (and I'm one) the bots are a scourge, they consume resources and they are clearly attempting to 'mine' the search engine for something, but they are unwilling to come forward and just ask the search provider if they would sell it to them. And since they are unwilling to pay, but willing to invest resources in appearing more human like, it leaves service providers in a pretty crappy position.

So anyone have some CSS or otherwise innocuous ways of identifying humans I'm all for it.


On a small scale, it's not too difficult. Detecting form POSTs with a new session catches most comment spam bots, and if an empty input field hidden with CSS is submitted with content, that's also a giveaway.

And I wouldn't discount javascript - another hidden field populated by onSubmit() is simple and effective. A few vocal paranoiacs advocate browsing with javascript turned off, but they are few and far between - and I bet they get sick of adding sites they want to see to their whitlist. We have over three thousand fairly technically aware users, and none have been tripped up by the javascript test.

If your site is valuable enough for an attacker to manually figure out your defences, then you need to consider emailing a a verification token - or even better, use SMS if you can afford the cost. Because this gives you a number to pass to law-enforcement, it means an attacker has to buy a burner SIM card.

Back on topic, Google's initiative is a useful tool to add to your defences.


Isn't this just the cost of having a "free" product? Bots are not really a problem. Its just that their traffic cannot be monetized. If you could monetize-bot traffic your problem would be solved. Or put another way, if you framed the issue as a business model one, not a technical one, it might be a useful exercise.


   > if you framed the issue as a business model one, not 
   > a technical one, it might be a useful exercise.
That was kind of my point. Clearly most of the bots are trying to scrape my search engine for some specific data. I would (generally) be happy to just sell them that data rather than have them waste time trying to scrape us (that is the business model, which goes something like "Hey we have a copy of the big chunk of the web on our servers, what do you want to know?" but none of the bot writers seem willing to got there. They don't even send an email to ask us "Hey, could we get a list of every site you've crawled that uses the following Wordpress theme?" No instead they send query after query for "/theme/xxx" p=1, p=2, ... p=300.

On a good day I just ban their IP for a while, when I'm feeling annoyed I send them results back that are bogus. But the weird thing is you can't even start a conversation with these folks, and I suppose that would be like looters saying "Well ok how about you help load this on a truck for me for 10 cents on the dollar and then your store won't be damaged." or something.


You may try to contact scrapers through access denied page.

Did you try to explicitly state that your data is available for sale when denying access to p=300?


If you wanted to buy data from Google, how would you email? What is Google's email address?


Google posts lots of contact information on their contact page. You would probably want to reach business development. I don't think they are willing to sell access to that index however, we (at Blekko) would. I suppose you could also try to pull it out of common crawl.


It need not to be commercial service. For example, Wikipedia is a donation-only service. A bot visit is generally not different then most user visiting (I'd assume most users don't donate anyway). Wikipedia doesn't really mind serving users that aren't donating, but the bot, while generally not different to normal user, are stealing resources away from actual users.


That's why Google needs proper API or Pro edition where you could execute proper SQL queries, etc.

Instead, Google is making their search less functional. I don't get why.


They should, but the proper response would be a solution that solves what others can't not complaining about someone not solving something you decided yourself to try out.


Including about other people's complaining.


It was sarcasm.


see no evil. here no evil. speak no evil.


Don't be evil


TIL 'free' is an excuse for unethical behaviour.


(Disclaimer: I work at Google, but not on ReCaptcha.)

The point of this change is to make things easier on 90% of humans -- the ones who have JavaScript and third-party cookies enabled now get to tick a checkbox and be on their merry way, instead of doing a useless captcha when we knew they were already humans. Recall that when ReCaptcha initially came out, the argument was "humans are wasting all of this time, let's turn it into useful work to digitize books".

If book-based or street view-based captchas go away, I suspect it will be because bots/spammers got better at solving them than humans, not because Google thinks that the machine learning spam detection approach is fail-proof.

Recall that "reading" captchas already pose an insurmountable barrier to users with conditions such as illiteracy, low vision, no vision, and dyslexia. To accommodate these users, audio captchas are also provided, but a 2011 paper suggests that audio captchas are either easy to defeat programmatically or are difficult for users themselves to understand: https://cdn.elie.net/publications/decaptcha-breaking-75-perc...


I am visually impaired and can attest to both visual captchas being a pain and audio captchas being hard to understand. this change is nothing but an improvement as far as accessibility and usability goes. This is only a plus for people who implement these, as I have actually left sites that had insurmountable captchas for me.

Thank you.


Check out webvisum.com - from their website:

"WebVisum is a unique browser add on which greatly enhances web accessibility and empowers the blind and visually impaired community by putting the control in your hands!"

"Automated and instant CAPTCHA image solving, sign up to web sites and make forum posts and blog comments without asking for help!"


I was curious about the CAPTCHA solving, too, so I tested WebVivum out on ~8 reCAPTCHAs.[1] It solved all except 2 of them, taking 20-60 seconds each time. In 2 cases it reported failing to solve the CAPTCHA, but it never gave an incorrect result. That is, whenever it gave a solution the solution was correct (in my brief test).

So, while it's some way off their claim of "instant" CAPTCHA solving, this is definitely a very useful addon, especially for those people who cannot solve CAPTCHAs at all. Thank you for pointing it out.

[1]https://www.webscript.io/examples/recaptcha


> Automated and instant CAPTCHA image solving

How do they do that? This sounds like whitehat use of blackhat tools. Are they using captcha-solving farms?


There are ways to solve captchas somewhat reliably programatically. I suspect this plugin only works on certain computer generated captchas, not the street sign ones.

http://resources.infosecinstitute.com/introduction-to-automa...


They send the captcha to their servers and how they solve them is a secret.

http://www.webvisum.com/en/main/faq#q15


Is there a web service where one could purchase AI recognition of fuzzy text, e.g. a street sign or book cover in a photo?



Very helpful, thank you! I have a difficult OCR problem to solve, rather than identity. Interesting to see that the market price for "being human" is $0.00139.


For non-captcha OCR also consider Mechanical Turk. And there are a variety of services built on Turk too.


The fact that this works shows that distorted-text captchas are no longer effective.

From the Google's blog post:

> our research recently showed that today’s Artificial Intelligence technology can solve even the most difficult variant of distorted text at 99.8% accuracy


If book-based or street view-based captchas go away, I suspect it will be because bots/spammers got better at solving them than humans

But, wait. Isn't that what we want? It seems like bots and spammers have a relatively small cost to a company like google, while digitizing books and house numbers is relatively valuable. I don't have numbers for a detailed cost-benefit analysis, but if bots get good enough to do time consuming work accurately, that's a win right?


That's like flying because you like airline food. No one flies if they don't have a destination. No one will put a captcha on their site if it doesn't tell computers and humans apart; that's its primary job.


From you description, you do sound kinda' like a bot. Disabled cookies. Disabled Javascript. Irregular searches. I understand the frustration with saying, "You have to have these features supported to use the product," but let's face it: providing an experience to people who deliberately disable huge chunks of browser functionality is a tremendous pain in the ass. I think I can understand both sides of the argument using different strawmen:

"Can I read this paper, please?"

"Yes, of course, just put on these reading glasses."

"Why do I have to put on the reading glasses?"

"Well the font is quite small. If you don't wear the glasses, you probably won't be able to make out anything on the page. Your experience will be seriously degraded."

"I don't want to wear the glasses. Why can't I just read the page?"

"Well, we can fit a lot more data and make the page more robust by printing the text smaller. Why don't you just wear the glasses?"

"I have concerns about the glasses. I'd rather strain my eyes."

"We're not going to make a special page for you when 99% of the people are totally okay with wearing the glasses or wear the glasses anyways."


"I have JS and cookies disabled"

So imagine what bots often don't have.

Adding JS interaction and cookies takes more effort on the part of the programmer writing a bot.

So yeah, you'd look a lot more like a robot. How else would you quickly differentiate between human vs non-human based on a single request, or even a collection of requests over time? It's a game of stats at scale.


Here's a snippet of Python using the splinter library, to visit Google, type in a search query, and click 'search' (which is very Javascript heavy these days with their annoying 'instant' search).

from splinter import Browser b = Browser() b.visit('http://google.com') b.fill('q', 'browser automation') btn = b.find_by_name('btnG') btn.click()

Not exactly 'more effort'...


With Selenium you can open a full web browser such as Chrome or Firefox and have it run through. A Google search is six lines:

require "selenium-webdriver" driver = Selenium::WebDriver.for :firefox driver.navigate.to "http://google.com" element = driver.find_element(:name, 'q') element.send_keys "Hello WebDriver!" element.submit

https://code.google.com/p/selenium/wiki/RubyBindings

Writing a bot with js and cookies is trivial, but it definitely won't defeat these tools. They probably look for time between actions or track mouse movements, stuff that makes the bots super inefficient.


Yeah, but if you are trying to automate thousands of simultaneous requests, you'll have to use a lot of servers, which is costly even in the cloud.

Right now google and bing will run sites with JS enabled to see the DOM after any JS changes take hold. Usually these crawls aren't nearly as often as the general crawling, because there is quite a lot more CPU/Memory overhead to such utilities. I can't speak for splinter, but similar tools in node or phantomjs have a lot over overhead to them.


Still less effort than typical captcha.



That wasn't the point


It's more effort as less libraries support it, you need to execute unknown code on your computer, etc.


I think you're being a little hyperbolic. Google is classifying what is already normal human behavior. Having JavaScript disabled is definitely not "normal" human behavior. Of the stats I found only 1-2% of users don't get JS and the UK's Government Digital Service found[1] that only 0.2% of users disabled or can't support JS.

I don't think regular CAPTCHAs are going away anytime soon since any bot detection system is bound to have false positives.

[1] https://gds.blog.gov.uk/2013/10/21/how-many-people-are-missi...


Exactly. It's perfectly reasonable to present users who disable JS with a one-time CAPTCHA they have to solve to use the site. Many sites just (usually unintentionally) prevent users with Javascript disabled from navigating a site at all, so this is a huge step up from that.


The fact that Google is attempting to define and thus strongly normalise what is human behaviour is definitely a big red flag to me.

...But this is their core search competency and exactly what makes their search so powerful. Page rank is basically distributed wisdom of crowds, aka algorithm of how people behave (build their websites) based on a search term/imbedded link.

This seems like a perfect extension of this. Remember the vision of google: "to organize the world's information and make it universally accessible and useful." Human behavior falls squarely into a large segment of the "world's information."


>Remember the vision of google: "to organize the world's information and make it universally accessible and useful."

I'm sure that's why they got rid of the ability to search what people are saying on forum and blogs. Google still indexes everything, they just got rid of the filter.

Their results now give preference to SEO'd pages & adverts.

The old discussion filter returns an illegal request error https://www.google.com/?tbm=dsc


Their search is only "powerful" for finding the more mundane and widely disseminated information; I've noticed that it's increasingly difficult to find very specific information with it as it basically misunderstands the query and returns completely useless results. Maybe that's why I look like a bot, as I try to tell it exactly what I want...


Well this is exactly the point. Obscure information has a very low social/viral index and as a result a lot of people don't interact with it so it is hard to find with Google - which is why I don't think it is a particularly robust search engine on it's own in the grand scale of knowledge development.

Google seems robust because humans generally think pretty similarly, and generally look for the things that the people around them are talking about or also looking for. That breaks down considerably though across cultures and time.


When trying to use Google to find something obscure, I'm not so much bothered by the difficulty of doing so as I am by the implication that "real humans" don't use complex search queries. They used to teach in computer literacy courses how to use search engines, complete with complex multi-term boolean queries, to find exactly what you're looking for. Now try the same with Google and you're a bot? WTF? They're basically saying "humans are too stupid to do that - humans are supposed to be stupid."


Or that a lot of their target demographic has never been taught that, and so they've optimised their delivery to be accessible to the majority?


Well to be fair most of their users probably are too "stupid" (aka were never taught) to do that.


Their search IS powerful, even for obscure things. But when you disable JS and cookies, as you have done, you are taking a huge amount of that power away from the system. Of course you are going to get bad results for anything which is specific to you -- you have disabled their ability to make a better judgement!


> "I have JS and cookies disabled..."

Disabling essential parts of web functionality breaks web functionality. I'm shocked.

Dropping the snark though. I'm surprised that this is still a complaint. At this point in the web's evolution cookies and Javascript are essential. Disabling those will make your experience worse and complaining about that is like removing the windshield from a car and complaining that bugs get on your face.


Tracking cookies are certainly not essential.


Yeah, tracking cookies might not be. But cookies in general? They're essential for a large amount of sites to handle something as simple as logins.


I would suggest you're over-thinking it. The essence of it is "we think you're a bot because you haven't given us enough private information about yourself".

Exploiting that information is Google's core business, and it doesn't like people evading their panopticon. So they're no making life harder those who care about their privacy.

Not surrendering your data to Google? We'll treat you like you're not even human, and through reCaptcha we'll tell thousands of other websites to do the same. That will teach you to hide things from the all seeing eye of Mountain View.


Why shouldn't us bots be able to search or participate in forums?


As long as you abide by all the social norms including moving that damn mouse the right way, I have have no problems with you, dear bot.


We'll legislate inefficiency. If you cant be as slow as a human, then you will be restricted.


Bots are equal, but separate.


Anecdotally, I block cookies and tracking scripts from Google and even run some custom javascript to remove the link shim in search results. I have yet to encounter the "we think you're a bot" detection filter, except when performing numerous complex iterative queries or Googling from within Tor.

The above is to suggest that perhaps tracking bugs and cookies aren't a component in the bot-detection algorithm, though that remains to be seen.


Well, looking at an example, my behavior definitely trips the 'not not a bot' detection for NoCAPTCHAs for whatever reason. I'm not too shook up though - it's really no more inconvenient than before.


Can you suggest a way to tell the difference between you and a bot? Merely throwing flags around without offering anything better isn't very helpful.


There isn't a way. As AI improves bots become increasingly indistinguishable from humans. All this does is rely on the fact that bots tend to use different browsers and behave in different ways than humans. But that can be fixed.

But it doesn't matter. If a human user spams tons of links in the comments after creating 20+ accounts, who cares if they are a bot or are doing it manually? I believe that websites should instead use machine learning like this to detect the bad behavior itself, rather than try to determine who the user actually is.


"bot" means no profit from ads. There you have it.


We were actually just discussing the "what if I trip their filter" concern at our morning meeting. Full disclosure: my company (as the username implies) builds FunCaptcha, a CAPTCHA alternative. Your concern, to us, is a very valid one and has been a driving force behind our own design and mentality. Our lead designer is (understandably) passionate about this so he actually wrote a few words on the blog that dives pretty deeply into the topic, if you're inclined: https://www.funcaptcha.co/2014/12/04/killing-the-captcha-wit....


I've also tripped Google's bot filters. Frankly, I'm more offended that Google is discriminating against robots, seeing as they are one of the leading companies in automation and AI :-)


Though you were joking, it's worth noting they're certainly not discriminating against robots. They're discriminating against your robots.

Which is to say: they're perfectly willing to let your crawl-able content and internet use help train their robots, they just don't want their crawl-able content and internet use to train your robots.


Aren't we talking about spambots, mostly? While law-abiding bots should probably be allowed in most sites, nobody wants a spambot in their blog or forum.

Isn't it right to block spambots? And if so, how do you tell regular bots from spambots?


Use re-captcha to prevent the spam-bots from posting... the real bots will just crawl anyway.

A couple months ago, I implemented some regular expressions to try and block a lot of bad actors, and have that include smaller search engines... our analytics traffic dropped around 5% the next week... our actual load on the servers dropped almost 40% though. Unfortunately it was decided the 5% hit wasn't worth reducing the load 40%.

Which sucks, moving forward a lot of output caching will be used more heavily with JS enhancements for logged in users on top of the nearly identical output rendering.

Server-side React with some useragent sniffing will break out three rendering server side. "xs" for those devices that are "mobile" (phones), "sm" for other tablet/mobile devices ("android", "ios", etc), and otherwise "md" ... "lg" will only bump up on the client-side from "md". It corresponds to the bootstrap size breaks.

In essence, I don't care. Bots get the same as everyone else.. if you don't have JS, you can't login or fill out forms. Recaptcha should go a step farther in helping deal with bots...


^ Probably the most underlying comment in this topic.


Is there an anti-trust angle to this?


In terms of advances in automation and AI I welcome this development, because this is new offensive in bots vs advertisers/scrapers. Bots will of course adapt, it is only question of time, and adaptation is advancement in automation and understanding of human behavior.


Oh, I thought the parent comment was going to mention how Google might be using this to additionally train an AI to learn what human behavior is like, just like they did with the 411 service to collect voice data some years back.


I'm sorry the web cookie hegemony is oppressing you. Come up with a better solution to filter bots and Google will hire you. Nobody is pushing this down your throat.


You sound like a robot.


A robot permanently stuck on the "outraged" setting.


Replicants are like any other machine. They're either a benefit or a hazard. If they're a benefit, it's not my problem.


I used to use selenium to crawl Google webmaster tools data. Despite randomizations and such, they still had me marked as a bot.


As google gets stronger in AI, this becomes less of a problem, no?


The slippery slope has gotten a bit steeper.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: