Any tips on respectfully crawling HN so you don’t get throttled? I had an application idea that could not be served by the API (need karma values) so I started to write code to scrape but got rate limited pretty quickly.
I've had no trouble hitting the Firebase API at the speed items are created, with a 5 second delay between retries.
For scraping HN directly, in my experience you have to go extremely slow, like 1 minute between fetching items. And if you get blocked, it may be better to wait a long time (minutes) before trying again rather than exponential backoff, in order to get out of the penalty box. You'll need a cache for sure.
{
"by" : "jkarneges",
"id" : 45533018,
"kids" : [ 45533616 ],
"parent" : 45532549,
"text" : "The HN/Firebase API doesn't make this easy. For <a href=\"https://hnstream.com\" rel=\"nofollow\">https://hnstream.com</a> I ended up crawling items to find the article.",
"time" : 1760043552,
"type" : "comment"
}
"parent" can either be the actual parent comment or the parent article, depending where in the comment chain you are.
As does hnstream.com from the sourced sample comment itself. Both just traverse the parent id until it's the root (article). It takes more queries, but the API is not rate limited.
It wouldn't take more queries if the comments were cached. It could probably be done entirely in memory, HN's entire corpus can't be that large.
If one were to start at the page endpoints (eg /topstories) one could add references to origin ids while preloading comments, and probably cover the most likely to be referenced ID, and even make traversal up the tree even more efficient.
Congrats on the project! You may be right. There are other SSE services, but I can't think of one that allows clients to subscribe without authentication.
Not requiring client auth certainly makes things simple. It can even work for private data if the topics are sufficiently unguessable.
> the kids who grew up in those homes are writing things that take place there
This is kind of like how trench coats are associated with detectives, because they were regular clothing for anyone around the time of early detective films.
I agree it has some problems. For now, it is mostly a UX proof-of-concept and probably not how an official poll should be conducted.
> What problem is this envisioned as solving?
Its core mission is to legitimize all candidates on the ballot. This is something caucuses and ranked-choice voting can do, but since our general elections don't work this way, I wonder if the voting experience could be augmented from the private sector. (Of course, efforts to change how our actual elections work is still worthwhile and can be pursued in parallel).
Basically, if enough people (millions) were to use an app like this to meta-vote before committing to a single actual vote, we could simulate alternative voting processes without government involvement.
> An edge cloud does seem ideal for national live events though
OK, you almost had me believing you but "edge cloud" is 100% Fastly marketing speech. Any time I see somebody mention using Fastly, they turn out to be a Fastly employee. You've posted this demo three times already, give it a rest.
Hmm. Yes, using Fastly for that. Can you go to https://ileantoward.com/geotest and see if anything looks fishy? Notably country_code and region (region should have a state code if country is "US").
I'm not sure there'd be much benefit to using shielding with WebSockets, since the traffic wouldn't be cached/collapsed. You can still shield HTTP traffic on the same domain being used for WebSockets.
Was that a serious question? If so, a quick perusal of the archives will show you where the flag hangs. Just look for all the greyed-out posts and you'll know where the Overton window is to be found.