IMO: there should be explicit error path for invalid configuration, so the program would abort with specific exit code and/or message. And there should be a superviser which would detect this behaviour, rollback old working config and wait for few minutes before trying to apply new config again (of course with corresponding alerts).
So basically bad config should be explicitly processed and handled by rolling back to known working config.
You don’t even need all the ceremony. If the config gets updated every 5 minutes, it surely is being hot-reloaded. If that’s the case, the old config is already in memory when the new config is being parsed. If that’s the case, parsing shouldn’t have panicked, but logged a warning, and carried on with the old config that must already be in memory.
> If that’s the case, the old config is already in memory when the new config is being parsed
I think that's explicitly a non-goal. My understanding is that Cloudflare prefers fail safe (blocking legitimate traffic) over fail open (allowing harmful traffic).
System outputting the configuration file failed (it could check the size and/or content and stop right away), but also a system importing the file failed. These usually sound simple/stupid in a hindsight. I am not a fan of everything centralising to a few hands. As in bad situation, they can also be weaponised or attacked. And in good situation their blast radius is just too big and a bit random, in this case global.
So basically bad config should be explicitly processed and handled by rolling back to known working config.