I've never met anyone who say's "I like YAML, it is great"... most people that w...

dijit · on Jan 12, 2023

> I've never met anyone who say's "I like YAML, it is great"

Maybe I'm older than you, but I have definitely heard that line.

Mostly because the alternatives were XML, INI or the myriad of bespoke formats, relayd/apachehttpd .conf or iptables etc;etc;

INI has parsers that operate in different ways and doesn't support heirarchies... so that's not ideal.

JSON and YAML came to the fore around the same time, and JSONs limitations in comments and it's picky semantics meant that people did prefer YAML over JSON for human readable configs.

YAML itself is fine, it has some really awkward warts and the parsers are usually programatically unsafe in their implementation (leading to less compatible "safe_load" or other types of loaders)[0]; the issue we actually have with YAML is that we:

A) Template it (jinja, mustache whatever)

B) Put entirely too much stuff into it. (kubernetes manfiests can grow to the hundreds of lines really easily)

These problems will affect any configuration file format we choose to use, including TOML (which is comparatively new on the block), because reading templated/enormous files is really difficult.

What I've taken to doing is programatically generating objects and then serialising them as whatever my software depends on. It might feel ugly to use an entire turing complete language to generate objects that are mostly static: but honestly... the ability to breakpoint, test and print the subsections of output is astonishingly nice.

Then I don't care at all what the format is.

[0]: https://www.serendipidata.com/posts/safe-api-design-and-pyya...

falcolas · on Jan 12, 2023

I kinda miss XML. Yeah, I'm weird.

The tooling is super mature, it's easy to emit, it's easy to parse, it's easy to validate, it can just a little hard to read and write by hand (and I mostly blame SOAP for that). Still, basic XML isn't that hard to read or write, thanks to editor support.

dijit · on Jan 12, 2023

I still wouldn't want to read this, and this is a simple example. :\

    <apiVersion>apps/v1</apiVersion>
    <kind>Deployment</kind>
    <metadata>
      <name>
        some-deployment
      </name>
      <namespace>
        deployment-namespace
      </namespace>
    </metadata>
    <spec>
      <replicas>1</replicas>
      <revisionHistoryLimit>3</revisionHistoryLimit>
      <selector>
        <matchLabels>
          <app>
            some
          </app>
        </matchLabels>
      </selector>
      <template>
        <metadata>
          <labels>
            <app>
              some
            </app>
          </labels>
        </metadata>
        <spec>
          <serviceAccountName>dbuser</serviceAccountName>
          <nodeSelector>
            <iam.gke.io/gke-metadata-server-enabled>true</iam.gke.io/gke-metadata-server-enabled>
          </nodeSelector>
          <imagePullSecrets>
            <name>dckr-auth</name>
          </imagePullSecrets>
          <containers>
            <name>
              some-service-container
            </name>
            <image>
              dckr.io/some/deploymentImage:latest
            </image>
            <imagePullPolicy>Always</imagePullPolicy>
            <ports>
              <containerPort>8000</containerPort>
            </ports>
            <env>
              <name>DB_URL</name>
              <value>postgresql://pgsql%40{{.Env.GCP_PROJECT_ID}}.iam@127.0.0.1:5432/{{.Env.NAMESPACE}}-{{.Env.SERVICE | strings.TrimSuffix "-service"}}</value>
            </env>
            <env>
              <name>LOGGING_LEVEL</name>
              <value>debug</value>
            </env>
            <env>
              <name>TRACING_ENABLED</name>
              <value>false</value>
            </env>
            <env>
              <name>APP_NAME</name>
              <value>
                some
              </value>
            </env>
            <env>
              <name>AUTH_URL</name>
              <value>http://auth/auth</value>
            </env>
            <env>
              <name>KEYCLOAK_KEYSET</name>
              <value>/protocol/openid-connect/certs</value>
            </env>
            <env>
              <name>MAILCHIMP_URL</name>
              <value>https://mandrillapp.com/api/1.0</value>
            </env>
            <env>
              <name>STREAMING_SASL_ENABLED</name>
              <value>true</value>
            </env>
            <env>
              <name>CORS_ENABLED</name>
              <value>true</value>
            </env>
            <env>
              <name>CORS_ALLOWED_ORIGINS</name>
              <value>*</value>
            </env>
            <env>
              <name>CORS_ALLOWED_METHODS</name>
              <value>*</value>
            </env>
            <envFrom>
              <secretRef>
                <name>kafka-access-secret</name>
              </secretRef>
            </envFrom>
          </containers>
          <containers>
            <name>cloud-sql-proxy</name>
            <image>gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.0.0-preview.3</image>
            <args>--auto-iam-authn</args>
            <args>--structured-logs</args>
            <args>{{.Env.SQL_CONNECTION_STRING}}</args>
            <securityContext>
              <runAsNonRoot>true</runAsNonRoot>
            </securityContext>
          </containers>
        </spec>
      </template>
    </spec>

falcolas · on Jan 12, 2023

Honestly, it doesn't feel too bad to me. But it's also a bit overly nested for many things; I wouldn't want to read that in JSON either.

XPath (and JQ) FTW here.

lostmsu · on Jan 12, 2023

Reads fine to me, although would benefit from a few element->attribute switches.

mickeyp · on Jan 12, 2023

I like YAML.

I like that you can use anchors and merges. It greatly simplifies complex, repetive structures. And most of the complaints about yaml can be worked around by string-quoting.

The whitespace can get in the way if you're templating, but then you can also use [1, 2, 3] as a list notation, for example.

In fact, most of the complaints could be resolved by running it through a linter.

tipiirai · on Jan 12, 2023

I like YAML. More specifically, the subset of YAML, like the author suggests. Clear, intuitive, and allows expressing of complex data structures like JSON does. Much better than TOML, which easily becomes a mess with more complex data.

andybak · on Jan 12, 2023

Yeah. My headcanon YAML is amazing.

xwolfi · on Jan 12, 2023

Yup exactly my experience as well, again a stupid idea to try and make a "configuration" language out of nested key value pairs that end up needing fancy interpreters allowing more and more semantic into the keys and values to start doing what a simple program could have done in half the time...

I ve worked in 4 companies over a period of 10 years, each had exactly this problem, with yml, json, xml, properties file (you dont want to see business logic conditionals in a properties text file, where the keys shapes command an interpreter to behave dynamically...)

The only times I saw a team do it well was a php backend of all things where the lead said they d program all their variations in php rather than source it from configuration flat descriptors and it was amazing, clear, simple and powerful. They had to release the backend at each config change instead of releasing the config change only, but Im still unsure why exactly that's a problem: the configs are software too if we re honest with ourselves, shoe-horning them in a descriptor language isnt gonna make them flat.

ptsneves · on Jan 12, 2023

The problem is mixing data with logic. I cannot imagine how maintainable a small program becomes when this concept is employed, much less a big one.

jeroenhd · on Jan 12, 2023

I don't think YAML is great, but I still think it's the best format out there.

The only confusing problem I've run into was the sexagesimal number notation and even that was fairly obvious. Perhaps it's because I tend to overquote strings?

I mean sure, the on/off to boolean mappings are annoying, but they also become very obvious when you're parsing config because the type validation will fail. If `flush_cache` has an enum `on` but no key `True` then the type validator will instantly complain about both the missing key and the extra key in the dictionary.

Same with accidental numbers, any type check will show that the parsing failed.

I find JSON for config files to become unreadable quickly because of the non-obvious nesting and the lack of comments. You can pick a JSON extension but then you need to pick one that your tooling will support.

jefftk · on Jan 12, 2023

> I still think it's the best format out there.

What do you think of https://toml.io ?

jeroenhd · on Jan 12, 2023

TOML solves a lot of issues but I find it hard to visualise when you get much deeper than two or three levels.

jacurtis · on Jan 12, 2023

Exactly this. I hear so many people recommend TOML over YAML.

I see the logic in it. For simple Key-value configurations, TOML is superior and more straightforward to YAML. You can add sub-level values and it isn't too bad (if there aren't too many), but beyond two levels, TOML becomes difficult to use.

If you really work in YAML in any sort of more advanced capacity (kubernetes, Ansible, CI/CD Pipelines) then you really need the complexity that YAML provides. You also get used to the "gotchas" mentioned here. Navigating them is fairly straightfoward.

I think the article was vastly overblown. Is YAML perfect? Certainly not. But you find a better way to display such complex data structures in a more human-readable and human-writable way. The complexity is YAML's strength, but it comes with caveats as all complexity generally does. I really think its the best we have.

ilyt · on Jan 12, 2023

I think problem in this particular case is using YAML as DSL. Every other data format would be equally bad here. Replace YAML with TOML and you're still in same templating hell.

YAML is least worst for me, and I don't think I ever hit the problems article is showing because

* I use editor that will highlight stuff like anchors

* I often generate config from CM so it can't have those errors

* Loading into defined struct in statically typed language also makes them impossible.