> *Do you really need all those features?* "You" probably do not. But different ...

bartread · 2025-09-18T10:53:21 1758192801

It's the same as the old joke about Microsoft Word: people only use 10% of Word's functionality, but the problem is each person uses a different 10%.

Of course this is an oversimplification, and there will no doubt be some sort of long tail, but it expresses the challenge well. I'd imagine the same is true for many other reasonably complex libraries, frameworks, or applications.

agwa · 2025-09-18T13:59:28 1758203968

XML without DTDs is a very reasonable subset that eliminates significant complexity (no need for an HTTP client!) and security risks (no custom character entities that are infinitely recursive or read /etc/passwd!) and would probably still work for >80% of users.

(I wrote such an XML parser a long time ago.)

jlarocco · 2025-09-18T14:21:02 1758205262

Why throw out numbers when we all know you haven't actually measured that it's >80%?

In any case, the tooling around XML (DTDs, XPath, XSLT, etc.) is the reason to use it. I would go so far as to say the (supposed) >80% not using those features shouldn't have used XML in the first place.

tracker1 · 2025-09-18T21:00:22 1758229222

I agree.. which is part of why I generally dislike using XML for most things.

x0x0 · 2025-09-19T18:03:19 1758304999

Not to mention that libxml2 underlies things like nokogiri (the commonly used html parsing gem for ruby), beautifulsoup (python's equivalent), etc.

dragonwriter · 2025-09-19T18:10:42 1758305442

Pretty sure beautifulsoup uses python’s builtin html.parser but can optionally use html5lib or lxml if installed, and it is lxml, not beautifulsoup, that actually depends on libxml2.

You’re right about nokogiri, though.

x0x0 · 2025-09-19T19:48:35 1758311315

Ah, you're right, in the codebase I'm familiar with lxml is used for performance, though it's not the default.