Respecting robots.txt is something their training crawler should do, and I see n... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		SonOfLilit on June 15, 2024 \| parent \| context \| favorite \| on: Perplexity AI is lying about their user agent Respecting robots.txt is something their training crawler should do, and I see no reason why their user agent (i.e. user asks it to retrieve a web page, it does) should, as it isn't a crawler (doesn't walk the graph). As to "lying" about their user agents - this is 2024, the "User-Agent" header is considered a combination bug and privacy issue, all major browsers lie about being a browser that was popular many years ago, and recently the biggest browser(s?) standardized on sending one exact string from now on forever (which would obviously be a lie). This header is deprecated in every practical sense, and every user agent should send a legacy value saying "this is mozilla 5" just like Edge and Chrome and Firefox do (because at some point people figured out that if even one website exists that customizes by user agent but did not expect that new browsers would be released, nor was maintained since, then the internet would be broken unless they lie). So Perplexity doing the same is standard, and best, practice.

underdeserver on June 15, 2024 [–]

They might be "lying" because of all sorts of reasons, but a specific version of Chrome on a specific OS still sends a unique user agent string.

SonOfLilit on June 15, 2024 | [–]

I stand corrected, thanks. However, I don't think it impacts my point.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact