Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't get all the YAML hate. The article even mentions solutions, which are all better than adopting some nonstandard json variant or toml.

- for whatever language you're using, be aware of which YAML version its YAML library supports and its defaults, and how to safe load yaml in that language.

- defensively quote strings, particularly if you're on a language with an antiquated yaml library that defaults to yaml 1.1.

- defensively use true/false only, and proactively convert any other booleans in your codebase to true/false.

- Depending on your language, you can avoid any custom data types for any externally-distributed applications to mitigate the risks, even when it might make things more convenient. Use safe loading (most languages with yaml libs support it) to avoid loading any. The other major YAML alternatives that YAML haters recommend won't have custom data types either.

Features like references and folding semantics are very convenient, and you don't even have to use them. Basic yaml enjoys better readability than json. toml is only fine if you don't have much nested data.

The author's note that there are json variants that help with some json failings makes no sense. If you're going to adopt some non-standard json variant, why not just adopt yaml 1.2, make sure your language has a yaml lib that supports it, and use that? At least yaml 1.2 is standardized. It's not their fault if python's libyaml only supports yaml 1.1. It looks like pyyaml is essentially in maintenance mode and ruamel.yaml is what everyone should be using? Unfortunately nobody's gotten around to implementing safe loading natively, but that's a python problem, not a yaml 1.2 problem, and ruamel.yaml supports pure-py safe loading that's compliant with 1.2 (no integer-interpretation gotchas), which is fine in most cases where yaml is only loaded occasionally, i.e. at start-up, and performance isn't critical.

Obviously YAML has historical problems, but what's better? Using another flawed or even more limited data format, inventing your own which will begin with zero adoption, or simply ensuring your environment/app uses yaml 1.2 and best practices?



Having to watch out for footguns (by being defensive as you described) means you're just having to learn these rules. What benefit is YAML giving you then?

Why not use JSON? Conversion of types from YAML to the language's native types is a poor feature to get in return for footguns.

If you _need_ complex types - like a in RPC serialization - then i would imagine using XML is better, and forget about making it human-readable. Create a client/testbed instead.

For simple things, JSON fits really well imho.


Or you could ensure your language supports and defaults to yaml 1.2, so you don't have worry so much.

Every language has warts and pitfalls and things that are recommended against. Suggesting that one should abandon such languages, whether full-featured or data-description DSLs, when they're otherwise more productive than the alternatives, is disingenuous.

Suggesting that yaml offers no genuine advantages? JSON is annoying. It requires good editor syntax checking support to avoid mismatched and trailing punctuation. It lacks comments. I could live without the nice syntactic sugar of yaml, and even its lack of references and line-folding, if it weren't for those other things.

Which leaves something like cfg or json5, both of which have less support than yaml 1.2. So wth are we talking about? Don't use perhaps the best available (human-editable, inline-nested) data DSL because it might not be supported in some language, and because it has a few potential footguns just like json does? toml may be clean, but it's not as easy to deal with because of how it represents nested structures (which aren't fully inlined like they are in json or yaml), as so many people on this thread have pointed out.

There's a reason so many modern tools use yaml. It's not that their authors weren't aware of json. Plenty of people are simply more sick of json's problems, and consider yaml a superior, not perfect, alternative.


The amount of "defensively [do something]" and "best practices" is another way of saying "this format is so bad you need to thread carefully or else we'll blame it on you".

Good format should not require defensive writing, and also "best practice" is a bit of misunderstanding: it's best practice to avoid such a bad format, not attempt to screw people into it and then blame them for not following some arbitrary set of rules.


I agree with most of your suggestions, but I wouldn't use "safe_load" unless the file has untrusted input (in which case, you could argue, you shouldn't be using YAML at all).

If you use safe_load then you'll get an exception if you use dates or timestamps, even if it's an unquoted string that happens to look like a date (but you can get around that last one by quoting).


Why are the solutions better than adopting toml? You don’t provide any justification for that claim.


I did. Extensively nested data structures.


> Obviously YAML has historical problems, but what's better?

toml


One of the major use-cases for YAML is as a host language for DSLs, essentially basic programming languages for specific use-cases. TOML isn't as good there, as these languages often need deep nesting of data structures. GitHub Actions is the obvious example.


Yeah, toml isn't great for that. It's greatly preferable to yaml for human-readable config. Json is better as a serialisation format.

DSLs are bit vexed, not least because they're often horrible in themselves - cruft-accumulating hacks with no tooling, blurring the distinction between code and config often as a workaround for unwieldy dev processes. That's certainly not true of all DSLs, though (can't speak for Github Actions) and there is a need for a simple common base syntax/parser for them. I don't think yaml is a good solution.

Being a cantankerous old git, I tend to think the problem of a common base syntax/parser for DSLs was solved 70 years ago: s-expressions/lisp.


Yes, agreed. I've found TOML for config to be quite nice.

This is my go-to article for discussions about generic languages vs DSLs: http://mikehadlow.blogspot.com/2012/05/configuration-complex...

I'm very iffy on DSLs in general. One that I'm using at the moment has foreach loops but no if statements! And... all the inputs must be specified in a YAML file!


A great alternative for stuff like that is https://kdl.dev/, which feels much more natural for DSLs.


Quite frankly, all YAML DSLs in the wild are terrible, so if that's a major use-case, I'd say it was a failure.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: