Login is typically disabled for crawlers.

Jaruzel · on Aug 24, 2021

Which by Google's own T&Cs is illegal and could get your site dropped from their index... They state that you should not present a different site experience to their crawler compared to what normal users get.

As usual it's OK for the big guys to break the rules whereas us little site owners have no choice but to obey them.

marcosdumay · on Aug 24, 2021

Don't they have an explicit exception for login screens on their T&C?

johnc1231 · on Aug 24, 2021

How do they identify a crawler? Can a user not masquerade as a crawler somehow?

est31 · on Aug 24, 2021

Sometimes they use user agents for this, but those are easily faked, so it's only done by websites that don't have comprehensive auth walls.

A more comprehensive method is based on ip ranges, say whitelisting traffic from Google and Bing. This gives you > 95% of search traffic as Google alone has >90%, and many various smaller search engines like Yahoo or DDG are Bing resellers.

On the other hand, a pure ip based check can be circumvented too. Sometimes you can view how search engines see a website through google translate. But places like LinkedIn have countermeasures for all of these circumventions.

tjpnz · on Aug 24, 2021

Normally by user agent. You can adopt the same UA to get past some paywalls, but it doesn't always work.