I am afraid of the compile-time cost. For this kind of things I tend to prefer u...

chipdart · on July 10, 2024

> I am afraid of the compile-time cost.

Even though compilation time is the bane of C++, I think this concern regarding this specific usage is grossly overblown. I'm going to tell you why.

With incremental builds you only rebuild whatever has changed in your project. Embedding JSON documents in a C++ app is the kind of thing that is rarely touched, specially if all your requirements are met by serializing docs at compile time. This means that this deserialization will only be rarely rebuilt, and only under two scenarios: full rebuild, and touching the file.

As far as full rebuilds go, there is no scenario where deserializing JSON represents a relevant task in your build tree.

As for touching the file, if for some weird and unbelievable reason the build step for the JSON deserialization component is deemed too computationally expensive, it's trivial to move this specific component into a subproject that's built independently. This means that the full cost of an incremental build boils down to a) rebuilding your tiny JSON deserialization subproject, b) linking. Step a) runs happily in parallel with any other build task, thus it's impact is meaningless.

To read more on the topic, google for "horizontal architecture", a concept popularized by the book "Large-Scale C++: Process and Architecture, Volume 1" By John Lakos.

Mountain out of a molehill.

OskarS · on July 10, 2024

There is another scenario where this is an issue: if this code ends up in a header which is included in a lot of places. You might say "that's dumb, don't do that", but there is a real tendency in C++ for things to migrate into headers (because they're templates, because you want them to be aggressively inlined, for convenience, whatever), and then headers get included into other headers, then without knowing it you suddenly have disastrous compile times.

Like, for this particular example, you might start out with a header that looks like:

    SomeData get_data_from_json(std::string_view json);

with nothing else in it, everything else in a .cpp file.

Then somebody comes around and says "we'd like to reuse the parsing logic to get SomeOtherData as well" and your nice, one-line header becomes

    template<typename Ret>
    Ret get_data_from_json(std::string_view json) {
        // .. a gazillion lines of template-heavy code
    }

which ends up without someone noticing it in "CommonUtils.hpp", and now your compiler wants to curl up in a ball and cry every time you build.

It takes more discipline than you think across a team to prevent this from happening, mostly because a lot of people don't take "this takes too long to compile" as a serious complaint if it involves any kind of other trade-off.

chipdart · on July 10, 2024

> There is another scenario where this is an issue: if this code ends up in a header which is included in a lot of places.

This is all on itself a sign that your project is fundamentally broken, but this is already covered by scenario b) incremental builds.

Even if for some reason you resist the urge of following best practices and not create your own problems, there are a myriad of techniques to limit the impact of touching a single file in your builds. Using a facade class to move your serialized JSON to an implementation detail of a class is perhaps the lowest effort one, but the textbook example would be something like a PIMPL.

The main problem with the build time of C++ projects are not the build times per se but clueless developers, who are oblivious to the problem domain, fumbling basic things and ending up creating their own problems. Once one of them stops to ask themself why is the project taking so much time to build, more often than not you find yourself a few commits away from dropping build times to a fraction of the cost. Even onboarding something like ccache requires no more than setting an environment variable.

actionfromafar · on July 10, 2024

Fundamentally broken, or waiting for modules to become a thing? I tried to use https://github.com/mjspncr/lzz3ᵃ for a few years but it became impractical to me to fiddle with tooling.

a: You don't have source file and header file, you put everything in one file and lzz sorts it out during build.

actionfromafar · on July 11, 2024

https://github.com/mjspncr/lzz3

matheusmoreira · on July 12, 2024

> CommonUtils.hpp

That's the root cause of the slow build. That file is likely to be depended on by way too many other files, triggering massive rebuilds when unrelated code is modified. The headers should be as granular as possible. Breaking up the generic utils file into many specific files will help contain the damage whenever any given file is changed.

I wish it was possible to track source code dependencies at the function level.

OskarS · on July 12, 2024

It's not just that. If ALL that was in the header was the function prototype, that adds basically nothing to the compile time. The problem is when you have significant codegen and parsing in headers, like you do with templates and class definitions and stuff like that.

Like, most C projects import enormous headers with gazillions of function prototypes without having a particularly measurable impact on compile times (compile times in C is quite good, in fact!)

matheusmoreira · on July 13, 2024

Right. For a second I forgot this was a C++ discussion.

Breaking up the headers into granular files should still help reduce the amount of instatiation that's going on at compile time provided there isn't much overlap in the headers included by the source files.

erik_seaberg · on July 12, 2024

GCC, LLVM, and MSVC++ all support precompiled headers. How often is a unique and minimal set of #includes worth the extra cost?

matheusmoreira · on July 12, 2024

That helps reduce the cost of parsing the headers but doesn't eliminate the issue. Changing a header triggers a rebuild of everything that includes it. If the header is ubiquitous, nearly everything gets rebuilt.

We want to reduce the set of rebuilt files to a minimum. That means separate headers so that files that use A don't need to be recompiled because B changed and A and B are defined in the same header.

Taking this logic to the extreme would lead to one file per type or function. I've read a lot of code that's structured this way and it works well. Editing lots of small files is a little annoying though. In the end it's a tradeoff.

adolph · on July 10, 2024

Brings to mind the old story about a JSON DSL

https://thedailywtf.com/articles/the-inner-json-effect

threatripper · on July 10, 2024

Is this real? It can't be real. Nobody can be this stupid. But then again it takes a special kind of person who doesn't understand satire to actually do something like that. Somebody, where they would say "we trained him wrong on purpose as a kind of a joke".

cdirkx · on July 10, 2024

Nah I've seen this happen IRL. In this system "configuration" was read out of tables in a word document, processed via XSLT transformations and eventually it would spit out a huuuuge single C# document (recent "improvement", before that it was some obscure licenced language). Builds happened overnight because they took so long, and there was no way to test something locally.

The "advantage" of this system was that there was no need for programmers, as there was "no code", just configuration!. This was supposed to allowed "domain experts" without programming knowledge to work with the system. However a month long training by the creator of the system was still required, as he had to explain which of the 7 boolean types you should use if you wanted to add a new column 0.o (for those who want to know, there was true/false, 0/1, yes/no, true/false/unknown, true/false rendered as a toggle, true/false rendered as a checkbox...)

spacechild1 · on July 10, 2024

> In this system "configuration" was read out of tables in a word document, processed via XSLT transformations and eventually it would spit out a huuuuge single C# document

This is hilarious! It takes a special kind of ignorance to come up with a solution like this.

1f60c · on July 10, 2024

It has to be satire because of Tom's complete overreaction and the fact that comments are actually one of the easiest things to handle when building a lexer (usually, you just discard them). Eval'ing them makes no sense.

That said, I suppose stranger things have happened.

chipdart · on July 10, 2024

> Is this real? It can't be real. Nobody can be this stupid.

Having worked in an org with an official in-house genius who was terribly tight with a tech-illiterate leadership and faked his way into his status, I can't really tell. Throwing people under the bus, blaming the world around them for problems created by your brittle code, shunning best practices in favor of finger-pointing... This happens in small shops more often than we'd like believe.

As the saying goes, truth is stranger than fiction. Because fiction is expected to make sense.

_nalply · on July 10, 2024

It's the inner platform effect. When I was young I fell into the same trap. I invented a flexible database schema where I put each field into a database row with some metadata describing the field. But that's nonsense. Just use what the database provides.

There's a Wikipedia page about it: https://en.wikipedia.org/wiki/Inner-platform_effect

A variant of it is: Any sufficiently complicated program contains a slow and buggy implementation of half of Lisp. That's the Greenspun's tenth rule: https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule

This applies to the kernel as well to put it bluntly and a bit ironically: eBPF, but this shouldn't be understood that I mean that eBPF is not well thought out! https://en.wikipedia.org/wiki/EBPF

Joker_vD · on July 10, 2024

> flexible database schema where I put each field into a database row with some metadata describing the field.

I imagine everyone has invented this scheme at one point or another. It's so obvious, when you think about it!

matheusmoreira · on July 12, 2024

Any sufficiently advanced tagging system turns into dynamic typing for databases.

bruce511 · on July 10, 2024

I'm _pretty_ sure it's satire, but the fact that you and I can't say for sure is perhaps illustrative of the failure.

I've encountered this pattern several times over my career. Some very smart programmer decides that for "reasons" the standard way to do something is "bad". (Usually "performance" or "bloat" are words bandied around.) They then happily architect a new system to replace the "old thing". Of course the new thing is completely undocumented (because genius programmers don't waste their time writing docs).

If you're _lucky_ the programmer then spends his whole career there maintaining the thing. If you're lucky the whole thing becomes obsolete and discarded before he retires. Hint: You're not lucky.

So what you are left with is this big ball of smoosh, with no documentation, that no-one can figure-out, much less understand. Oh he designed this before multi-core processors were a thing? Before we switched to a preemtive threaded OS? Well no, none of the code is thread-safe, and he's left the company so we need someone to "just update it".

There are reasons standard libraries exist. There are usually reasons they're a bit slower than hand-coding specific cases in assembler. There are reasons why they are "bloated" with support for lots of edge-cases. (like comments).

When some really smart person starts talking about how it's all rubbish, be afraid. Be very afraid.

sumtechguy · on July 10, 2024

> There are reasons standard libraries exist

That right there. Before there is a standard lib for something if there are N people coding something up there could be N! ways to do something.

If you do not know about a standard lib or it doesn't exist there will be some wild code written.

It is when that standard library shows up you should at least consider just throwing your bespoke code away. Not always but should at least be considered. I personally have replaced thousands of lines of code and modules I wrote just by switching them to some existing library. The upside is if that standard lib does not do what I want I have enough knowledge to either bend it around so it does or I can fix it up (or put my bespoke code back). I know I am not that smart, but I know enough that my code is probably brittle and probably should be thrown away.

Also watch out for some 'standard libs'. Some of them are little more than someone's hobby project and have all the exact same issues you are trying to avoid. One project I worked on some guy had written a grid control. He was charging something like 10k a year to use it. But it was just one guy and I quote "i just touch it once or twice a year and drink margaritas on the beach". It was a bug prone riddled mess we spent a non insignificant amount of time fixing. We bought another one for a onetime fee of 500 bucks and it was wildly faster and more importantly had near zero bugs and a turn around time of 1-2 days if we found one.

kazinator · on July 10, 2024

> generate C or C++ instead of having the compile do the same thing much slowly

That's a wild-assed guess. A JSON decoder right in the compiler could easily be faster than generation involving extra tool invocations and multiple passes.

Also, if you use ten code generators for ten different features in a pipeline instead of ten compile-time things built into the language, will that still be faster? What if most files use just use one one or two features? You have to pass them through all the generators just in case; each generator decides whether the file contains anything that it knows how to expand.

pjc50 · on July 10, 2024

> You have to pass them through all the generators just in case; each generator decides whether the file contains anything that it knows how to expand.

The C# approach for this is that code generators operate as compiler plugins (and therefore also IDE plugins, so if you report an error from the code generator it goes with all the other compile errors). There is a two-pass approach where your plugin gets to scan the syntax tree quickly for "might be relevant" and then another go later; the first pass is cached.

A limitation of the plugin approach is that your codegen code itself has to be in a separate project that gets compiled first.

An argument in favor of separate-codegen is that if it breaks you can inspect the intermediate code, and indeed do things like breakpoints, logging and inspection in the code generator itself. The C++ approach seems like it might be hard to debug in some situations.

Joker_vD · on July 10, 2024

> A JSON decoder right in the compiler could easily be faster than generation involving extra tool invocations and multiple passes.

It also can easily be slower: C++ templates are not exactly known for their blazingly fast compilation speed. Besides, the program they encode in this case is effectively being interpreted by the C++ compiler which, I suppose, is not really optimized for that: it's still mostly oriented around emitting optimized machine code.

kazinator · on July 10, 2024

> C++ templates are not exactly known for their blazingly fast compilation speed.

Compared to what alternative that does the same thing, in C++?

Modern C++ compilers, as such, are slow as molasses, on multi-GHz hardware with huge L1 and L2 caches, whether your code uses templates or not.

Joker_vD · on July 11, 2024

The alternative would be to run the JSON through e.g. jq/sed and make it dump out a chunk of C++ that would create an object with proper fields and subobjects. This C++ code will have about zero template chicanery; instead, it would just call constexpr constructors which, I imagine, would be entirely boring — this C++ code will be compiled much faster.

jayd16 · on July 10, 2024

I'm not taking sides but I don't think a code-gen tool necessitates re-scanning the entire codebase every compile. gRPC would be a good example.

kazinator · on July 10, 2024

Well not every compile. Obviously, incremental compiles (thanks to a tool like make) notice that the generated code is still newer than the inputs.

Obviously, you have files that are not generated. They don't need any gen tool.

That's a disadvantage. If you want to start using JSON at compile time in a file, and the technology for that is a code generator, you have to move that file to a different category, perhaps by changing its suffix, and possibly indicate it somewhere in the build system as one of the sources needing the json generator. Whereas if it's in the language, you just do it in your .cpp file and that's it.

Token based macro preprocessors and code generators are simply not defensible in the face of structural macro systems and compile-time evaluation. They are just something you use when you don't have the latter. You can use code generators and preprocessors with languages that don't have anything built in, and which are resistant to change (will not support any decent metaprogramming in the foreseeable future).

dctwin · on July 10, 2024

Yes, I agree. I don't see much practical use in this. I was just surprised how (relatively) straightforwards this is to do, and thought it was more cool than useful

silon42 · on July 10, 2024

Often I also find the opposite problem ... sure, you can do some stuff in (c++) metaprogramming, but can you (at compile time) generate a JSON/XML/YAML file that can be fed to some other part of the system?

dctwin · on July 10, 2024

The opposite 'toString' problem seems harder - I didn't try, but it should be possible now that std::string is constexpr.

I don't think you could parse it with, say, a class that has a std::string member (because of the transience restriction), but perhaps you can use lambdas that capture that string by reference, and call each other as appropriate?

As for exporting that as some sort of compiler artefact for use elsewhere, I am not sure how you would do that...

chipdart · on July 10, 2024

> Yes, I agree. I don't see much practical use in this.

Me too. The best example I can come up with is loading test data in automated tests, but even then I wouldn't use this sort of approach.

ulrikrasmussen · on July 11, 2024

I like how code generation is typically done in Kotlin using KSP. Here you write your code generator as a plugin to the compiler, so you have the full expressivity of any JVM language you like. It also operates on the parsed and resolved AST, so you can analyze even derived types. It also allows code generators to run on code which has type errors or even fails to resolve some symbols which is very useful when you generate code from class annotations and then proceed to use the generated code later in the same file.

Another advantage of using KSP is that it also handles caching for you and will avoid running code generators again if the output already exists.

ranger_danger · on July 11, 2024

> I am afraid of the compile-time cost.

Still better than Rust /s