Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

They can just set a noindex tag in the HTML head or send it in the http response header for older pages, then links still work.


It's my understanding that doing what you propose will still use up the crawl budget, because the bot will have to download the page and parse it to understand that it is no-indexed.


Then you could also block those pages in robots.txt, no? (You do need to do both though, as otherwise pages can be indexed based on links, without being crawled.)


Exactly. This should be solvable without actually deleting the pages. I assume they're only removing articles with near-zero backlinks, so a noindex,nofollow should generally be fine, but if crawl budget is an issue robots.txt and sitemap can help.


The real answer is that there's a non-zero cost to maintain these pages, and even more so if robots.txt entries and such have to be maintained for them as well. And if they have no monetary benefit, or even potentially a detriment, it makes more sense for them from a business perspective to just get rid of them. Unfortunately.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: