Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

When I was at AWS, when we did postmortems on incidents we called it "root cause analysis", but it was understood by everyone that most incidents are multicausal and the actual analyses always ended up being fishbone diagrams.

Probably there are some teams which don't do this and really do treat RCA as trying to find a sole root cause, but I think a lot of "getting mad at RCA" is bikeshedding the terminology, and nothing to do with the actual practice.





Right, I'm not a semantic zealot on this point, but the post we're commenting on really does suggest that the Cloudflare incident had a root cause in basic database management failures, which is the substantive issue the root-cause-haters have with the term.

The layered-swiss-cheese model of understanding incidents tends to map to the real world better than the alternatives.

> to find a sole root cause

"Six billion years ago the dust around the young Sun coalesced into planets"


"Workaround: If we wait long enough, the earth will eventually be consumed by the sun."

https://xkcd.com/1822/


These days we tend to spend more time thinking about the "5 whys" (which often turn into more than 5) than the root cause itself. It's much more productive and useful.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: