I've reused the LGL (Large Graph Layout) algorithm used by The Opte Project [1,2] with more recent and comprehensive (multipath) traceroutes in 2022 [3].
I've also played a bit with an interactive visualization of the graph by using map tiles [4].
Bogus announcements are probably filtered by your upstream(s) (see [1] for a common list of filters).
IP-to-ASN mappings are typically built from route collectors [2,3] that peer with various networks and receive their announcements. AFAIK route collectors don't filter anything and it's easy to find bogus announcements (e.g. private ASNs) in the data.
I can't find 4294967296 from a quick glance at the latest RouteViews data but I can find other private ASNs. For example AS7594 - AS2764 - AS4294901866 for 210.10.189.0/24 seen by the route-views.perth collector.
I don't know what kind of filtering iptoasn.com is doing but at work (ipinfo.io) we do filter bogus origins, as well as a bunch of other things like RPKI/IRR-invalid routes and hyper-specific prefixes (> /24 or /48) [4].
Actually 4294967296 couldn't ever appear as the maximum value you can fit in the protocol field is 1 less than that... my problem here was I couldn't manage to keep the 2 numbers I was comparing (the one in the article and 2^32) straight haha! This was mistake was noted by a commenter here https://news.ycombinator.com/item?id=41963745
That said you're ultimately right that my upstream provider is filtering the 4294901866 value from the article as well anyways for the reasons you stated.
Another solution is to use an MMDB (“MaxMind DB”) file [1] which is essentially a binary tree + deduplicated values (same as idea 3.1).
There are several free ASN MMDBs [2,3] but you can also build your own MMDB files from any Prefix->Value mapping with the mmdbwriter library [4] or a CLI tool built on top of it like mmdbctl [5].
Assuming the ASN MMDB is fully loaded in memory, it would use around 60MB.
I can't speak for AWS specifically, but in my PhD thesis [1] I found a bunch of such examples by using RIPE Atlas probes. Essentially looking for pairs of probes where the RTT between probe A and probe C is larger than probe A-B + B-C.
Now there are some issues with this methodology (all common issues with ICMP/RTT measurements + traffic was not really routed through the "relay" probe), but such pairs do exist.
Nice achievement but always a bit disappointing that those records are based on throwing more money at the problem, rather than new theoretical grounds, or software improvements (IIRC y-cruncher is not open source).
Hosts with randomized addresses are likely to have auto-generated PTR records, or none at all, so for the purpose of rDNS resolution those are not a big issue.
And that’s a detail, but SLAAC as in RFC4842 is deterministic. The randomization is introduced by the privacy extensions in RFC4941.
I've also played a bit with an interactive visualization of the graph by using map tiles [4].
[1] https://github.com/TheOpteProject/LGL
[2] https://github.com/maxmouchet/minilgl (cleaned-up version of the code)
[3] https://www.maxmouchet.com/internet-viz/
[4] https://github.com/maxmouchet/internet-visualization