Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Login is typically disabled for crawlers.


Which by Google's own T&Cs is illegal and could get your site dropped from their index... They state that you should not present a different site experience to their crawler compared to what normal users get.

As usual it's OK for the big guys to break the rules whereas us little site owners have no choice but to obey them.


Don't they have an explicit exception for login screens on their T&C?


How do they identify a crawler? Can a user not masquerade as a crawler somehow?


Sometimes they use user agents for this, but those are easily faked, so it's only done by websites that don't have comprehensive auth walls.

A more comprehensive method is based on ip ranges, say whitelisting traffic from Google and Bing. This gives you > 95% of search traffic as Google alone has >90%, and many various smaller search engines like Yahoo or DDG are Bing resellers.

On the other hand, a pure ip based check can be circumvented too. Sometimes you can view how search engines see a website through google translate. But places like LinkedIn have countermeasures for all of these circumventions.


Normally by user agent. You can adopt the same UA to get past some paywalls, but it doesn't always work.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: