They can just set a noindex tag in the HTML head or send it in the http response...

kawsper · on Aug 10, 2023

It's my understanding that doing what you propose will still use up the crawl budget, because the bot will have to download the page and parse it to understand that it is no-indexed.

tempestn · on Aug 10, 2023

Then you could also block those pages in robots.txt, no? (You do need to do both though, as otherwise pages can be indexed based on links, without being crawled.)

collinmanderson · on Aug 10, 2023

Exactly. This should be solvable without actually deleting the pages. I assume they're only removing articles with near-zero backlinks, so a noindex,nofollow should generally be fine, but if crawl budget is an issue robots.txt and sitemap can help.

tempestn · on Aug 11, 2023

The real answer is that there's a non-zero cost to maintain these pages, and even more so if robots.txt entries and such have to be maintained for them as well. And if they have no monetary benefit, or even potentially a detriment, it makes more sense for them from a business perspective to just get rid of them. Unfortunately.