Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We are now seeing which companies do not consider the third party risk of single point of failures in systems they do not control as part of their infrastructure and what their contingency plan is.

It turns out so far, there isn't one. Other than contacting the CEO of Cloudflare rather than switching on a temporary mitigation measure to ensure minimal downtime.

Therefore, many engineers at affected companies would have failed their own systems design interviews.





Alternative infrastructure costs money, and it's hard to get approval from leadership in many cases. I think many know what the ideal solution looks like, but anything linked to budgets is often out of the engineer's hands.

In some cases it is also a valid business decision. If you have 2 hour down time every 5 years, it may not have a significant revenue impact. Most customers think it's too much bother to switch to a competitor anyway, and even if it were simple the competition might not be better. Nobody gets fired for buying IBM

The decision was probably made by someone else who moved on to a different company, so they can blame that person. It's only when down time significantly impacts your future ARR (and bonus) that leadership cares (assuming that someone can even prove that they actually lose customers).


On the other thread there were comments claiming it’s unknowable what IaaS some SaaS is using, but SaaS vendors need to disclose these things one way or another, e.g. DPAs. Here is for example renders list of subprocessors: https://render.com/security

It’s actually fairly easy to know which 3rd party services a SaaS depends on and map these risks. It’s normal due diligence for most companies to do so before contracting a SaaS.


Sometimes it's not worth it. Your plan is just to accept you'll be off for a day or two, while you switch to a competitor.

If there's a fitting competitor worth switching to.

Plus most people don't get blamed when AWS (or to a lesser extent Cloudflare) goes down, since everyone knows more than half the world is down, so there's not an urgent motivation to develop multi-vendor capability.


Can't say that when it is a time critical service such as hospitals, banks, financial institutions or air-traffic control services.

Only a fool would build an architecture for critical air-traffic with Cloudflare as a SPoF.

My point still stands.

Having no backup / contingency plan even if any third party system goes down on a time critical service means that you want to risk another disaster around the corner.

In those industries, accepting to wait for them for a "day or two" is not only unacceptable, it isn't even an option.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: