* The step in front of this query created updates to policies. It should have been limited in the number of changes it would do at once (and ideally per hour and per day and so on), and if it goes over that limit, stop updating, alert and wait until explicitly unblocked. DO NOT generate invalid config and start using that invalid config, use the previous one that worked and alert.
If this happens during startup use a default one.
That would still create impact (customers and developers would not see updates propagate), but would avoid destroying the service. When it comes to outages, people need to learn to go over what happens in the case of violating an invariant and look at what gets sacrificed in those cases, to make sure the answer isn't "the whole service".
If I get to be impolite, you do this because software architects, as seems to be the case here, often choose "crash and destroy the service" when their invariants are violated instead of "stop doing shit and alert" when faced with an unknown problem, or a problem they can't deal with.
This also requires test-crashing. You introduce an assert? Great! The more the merrier, seriously, you should have lots of them. BUT you will be including a test that the world doesn't end when your assert is hit.
If this happens during startup use a default one.
That would still create impact (customers and developers would not see updates propagate), but would avoid destroying the service. When it comes to outages, people need to learn to go over what happens in the case of violating an invariant and look at what gets sacrificed in those cases, to make sure the answer isn't "the whole service".
If I get to be impolite, you do this because software architects, as seems to be the case here, often choose "crash and destroy the service" when their invariants are violated instead of "stop doing shit and alert" when faced with an unknown problem, or a problem they can't deal with.
This also requires test-crashing. You introduce an assert? Great! The more the merrier, seriously, you should have lots of them. BUT you will be including a test that the world doesn't end when your assert is hit.