Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Queue the armchair infrastructure engineers.

The reality is that there’s a handful of people in the world that can operate systems at this sheer scale and complexity and I have mad respect for those in that camp.



Some of us are in that camp and are looking at this outage and also pointing out that they continuously fail to accurately update their status dashboard in this and prior outages. Yes, doing what AWS does is hard, and yes outages /will/ happen, it is no knock on them that this outage occurred, what is a knock is that they haven't communicated honestly while the outage was ongoing.


They address that in the post, and between Twitter, HN and other places there wasn’t anyone legit questioning if something was actually broken. Contacts at AWS also all were very clear that yes something was going on and being investigated. This narrative that AWS was pretending nothing was wrong just wasn’t true based on what we saw.


I'm going to leave it at this: the dashboards at AWS aren't automated.

Say what you will, but I can automate a status dashboard in a couple days--yes, even at AWS scale.

No reason the dashboard should be green for hours while their engineers and support are aware things aren't working.


Uh, no. You can’t. If you could, then you would already have been hired and you would have already solved this problem.

What you can do at what you think is AWS scale has no bearing on what you could actually do at real AWS scale.


Right I guess I would be hired without acknowledging my recruiter or applying I guess that makes sense

Related: Are you familiar with log aggregation/streaming?


Isn't this the equivalent of "complaining about your meal in a restaurant, I'd like to see you do better."

The point of eating at a restaurant is that I can't/don't want to cook. Likewise, I use AWS because I want them to do the hard work and I'm willing to pay for it.

How does that abrogate my right to complain if it goes badly (regardless of whether I could/couldn't do it myself)?


I think the distinction is you can say "I pay good money for you to do it properly and how dare you go down on me" but you become an "armchair infrastructure engineer" when you try and explain how you would have avoided the outage because you don't have the whole picture (especially based on a very carefully worded PR approved blog post).


This outage report reads like a violation of the SRE 101 checklist for networking management though.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: