Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

as somebody that has wrassled with the wikipedia dumps a number of times, i don't understand why wiki doesn't release some sort of sdk that gives you the 'official' parse


I have wrestled with it too. I believe it's because wikitext is an ad-hoc format that evolved so that the only 100% correct parser/renderer is the MediaWiki implementation. It's like asking for an SDK that correctly parses Perl. Only Perl can do that.

There are a bunch of mainly-compatible third party parsers in various languages. The best one I've found so far is Sweble but even it mishandles a small percentage of rare cases.


This. I tried that a few years ago and fell off my chair when I started to realized how DYI the thing is. It's a bunch of unofficial scripts and half-assed out of date help pages.

At the time I though, well it's a bunch of hippies with a small budget, who can blame them? Now I learn that there is 600 of them with a budget in the hundreds of millions??

This is becoming another Mozilla foundation...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: