It's not hard. Start with the WHATWG's spec, then incorporate the other specs it references using a reasonable heuristic to determine if a given item should be included or not.
If you don't think the estimate from Reckless, Infinite Scope is wildly off, then you either didn't read the methodology and do a spot-check of the dataset, or you really don't understand the scope of what gets published by W3C and how little much of it has to do with Web browsers or how many revisions of them there are.
The only bar that the heuristic has to pass here is "delivers a result that doesn't suck as bad as the analysis in Reckless, Infinite Scope". The analysis in that article is so bad, however, that your heuristic can literally be, "if you encounter an item that was also in Drew DeVault's input set, then assign an arbitrary probability 0.9 (or whatever) of whether the item should be counted", and it would still give you a more realistic result than what the article says (and that people are actually relying on in their arguments—and that you are defending) here.
Aside from that, given how many logical errors and weird counterconclusions[1] you've managed to stuff into this discussion, though (and to have been able to do so economically[2]), I'm going to go ahead and say this is my last response to you that I spend more than 10 seconds writing out.
If you don't think the estimate from Reckless, Infinite Scope is wildly off, then you either didn't read the methodology and do a spot-check of the dataset, or you really don't understand the scope of what gets published by W3C and how little much of it has to do with Web browsers or how many revisions of them there are.