Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I don’t understand why anyone should want this.

That's ok, but then you should bow out of the conversation, which is between people that do understand why anyone should want this.

To have predictable behavior is a must have in some industries, less in others. At the level of infrastructure that is deemed critical by some - and I'm curious what JGC's position on this is - the ability to avoid this kind of outage carries a lot of value. The fact that you do not see that CF has achieved life-critical reach is one that tells me that most of this effort is probably going to waste, but I trust that John does see it my way and realizes that if there are ways to avoid these kind of issues they should be researched. Because service uptime is something very important to companies like Cloudflare.

Boeing managed to kill a bunch of people with shitty business practices, not with shitty software, the software did what it was built to do. It is the whole process around that software as well as the type certification process and regulatory oversight that failed dramatically.





> That's ok, but then you should bow out of the conversation, which is between people that do understand why anyone should want this.

I was not making a statement that I am ignorant. I was saying I believe the proposal to model general software engineering after avionics is misguided and inviting you to clarify your position.

It is certainly valid to ask what CloudFlare or anyone else for that matter could learn from avionics engineering or from NASA or from civil engineering focused on large scale projects or anywhere else that good engineering practices might come from. However, there is a persistent undercurrent in discussions around software reliability and general software engineering that ignore the fact that there are major trade-offs made for different engineering efforts.

“Oh, look how reliable avionics are. We should just copy that.”

Cool, except I would bet avionics cost 100 times as much to build per line of code as anything CloudFlare has ever shipped. The design constraints are just fundamentally different. Avionics are built for a specific purpose in an effectively unchanging environment. If Cloudflare built their offerings in the same way, they would never ship new features, the quality of their request filtering would plummet as adversaries adjusted faster than CloudFlare could react, and realistically they would be overtaken by a competitor within a few years at most. They aren’t building avionics, so they shouldn’t engineer as if they are. Their engineering practices should reflect the reality of the environment in which they are building a product.

This is no different than people who ask, “Why don’t we build software the way we build bridges?” Because we’re not building bridges. Most bridges look exactly like some other bridge that was built 10 miles away. That’s nothing like building new software. That’s far more like deploying a new instance of existing software with slightly different config. And this is not to say that there is nothing for software engineers to learn from bridge building, but reductive “just do it like them” thinking is not useful.

> Boeing managed to kill a bunch of people with shitty business practices, not with shitty software, the software did what it was built to do.

The software was poorly designed. No doubt it was implemented the spec. Does that change the fact that the sum total of the engineering yielded a deadly result? There is no papering over the fact that “building to avionics standards” led direct to the deaths of 346 people in this case.


> I was not making a statement that I am ignorant.

ok.

> I was saying I believe the proposal to model general software engineering after avionics is misguided and inviting you to clarify your position.

But we are not talking about 'general software engineering', we are talking about Cloudflare specifically and that makes a massive difference.

> It is certainly valid to ask what CloudFlare or anyone else for that matter could learn from avionics engineering or from NASA or from civil engineering focused on large scale projects or anywhere else that good engineering practices might come from. However, there is a persistent undercurrent in discussions around software reliability and general software engineering that ignore the fact that there are major trade-offs made for different engineering efforts.

I think we are all aware of those trade offs. We are focusing on a specific outage here that cost an absolute fortune and that used some very specific technical constructs and we are wondering if there would have been better alternatives either by using different constructs or by using different engineering principles.

> “Oh, look how reliable avionics are. We should just copy that.”

> Cool, except I would bet avionics cost 100 times as much to build per line of code as anything CloudFlare has ever shipped.

And there is a pretty good chance that had they done that that they would have come out ahead.

> The design constraints are just fundamentally different.

Yes, but not quite that different that lessons learned can not be transported. The main reason why aviation is different is because it is a regulated industry and - at least in the past - regulators have teeth, and without their stamp of approval you are simply not taking off with passengers on board.

> Avionics are built for a specific purpose in an effectively unchanging environment.

That is very much not the case. The environment aircraft are subject to are - and increasingly so due to climate change - dynamic to a point that would probably surprise you.

What is not changing is this: the price for unexpected outcomes in that industry is that at some point global air travel will no longer be seen as safe and that once that happens one of the engines behind our economies will start failing. In that sense the differences with Cloudflare are in fact not that large.

> If Cloudflare built their offerings in the same way, they would never ship new features, the quality of their request filtering would plummet as adversaries adjusted faster than CloudFlare could react, and realistically they would be overtaken by a competitor within a few years at most. They aren’t building avionics, so they shouldn’t engineer as if they are. Their engineering practices should reflect the reality of the environment in which they are building a product.

I do not believe that you are correct here. They could, they can afford it and they have reached a scale at which the door is firmly closed against competitors, this is not a two bit start-up anymore.

> This is no different than people who ask, “Why don’t we build software the way we build bridges?” Because we’re not building bridges. Most bridges look exactly like some other bridge that was built 10 miles away. That’s nothing like building new software. That’s far more like deploying a new instance of existing software with slightly different config.

This too does not show deep insight into the kind of engineering that goes into any particular bridge. That they look the same to you is just the outside, the interface. But how a particular bridge is anchored and engineered can be a world of a difference from another bridge in a different soil situation, even if they look identical. The big trick is that they all look like simple constructs, but they're not.

> The software was poorly designed. No doubt it was implemented the spec. Does that change the fact that the sum total of the engineering yielded a deadly result? There is no papering over the fact that “building to avionics standards” led direct to the deaths of 346 people in this case.

That is not what happened and that is not what the outcome of the accident investigation led to conclude.

Boeing fucked up, not some software engineer taking a short-cut. This was a top down managed disaster with multiple attempts to cover up the root cause and a complete failure of regulatory oversight.


> I think we are all aware of those trade offs.

I'm not sure about that. This type of conversation tends toward "shit's easy syndrome" with complexities hand waved away and real trade offs given lip service consideration only. With respect to CloudFlare you specifically said "as soon as they become the cause of an outage they have invalidated their whole reason for existence". I don't know how to square black and white statements like that with an understanding of tradeoffs. A lot of companies would (and do) trade the potential for an outage against the ongoing value of CloudFlare's offerings.

> we are wondering if there would have been better alternatives either by using different constructs or by using different engineering principles.

I think what was actually said was "let's start off with holding them to the same standards as avionics software development". Not so much inquisitive as "shit's easy".

> And there is a pretty good chance that had they done that that they would have come out ahead.

How did you reach that conclusion? CloudFlare has taken a stock hit recently. Even if we attribute that 100% to their outage, they are still up 92% over the last year.

For comparison's sake, CloudFlare was founded after the 737 Max started development. I seriously doubt CloudFlare would have achieved its current success by attempting to ape avionics engineering.

> That is very much not the case. The environment aircraft are subject to are - and increasingly so due to climate change - dynamic to a point that would probably surprise you.

Did you honestly think I was referring to the actual weather? A plane built in 1970 will (assuming it's been maintained) still fly today just fine. The design constraints today are essentially the same and there are no adversaries out there changing the weather in a way that Boeing needs to continuously account for.

This is wholly different from CloudFlare, who is actively fighting botnets and other adversaries who are continuously adapting and changing tactics. The closest analog for avionics would probably be nation states that can scramble GPS.

> In that sense the differences with Cloudflare are in fact not that large.

In the sense that both are important and both happen to involve software, sure. In most other ways the differences are in fact very large.

> I do not believe that you are correct here. They could, they can afford it and they have reached a scale at which the door is firmly closed against competitors, this is not a two bit start-up anymore.

You are ignoring the reality of the situation, and it surfaces in self-contradictory statements like this. They have closed the door firmly on competition so now they need to focus on avionics-like engineering? Why? If their moat is unpassable they should just stop development and keep raking in money. The only reason that they even experienced this outage was because they are in continuous development.

The reality is that their moat is not that wide. If their adversaries or their competition outpace them, they could easily lose their customers to AWS or Azure or someone else.

> This too does not show deep insight into the kind of engineering that goes into any particular bridge. That they look the same to you is just the outside, the interface. But how a particular bridge is anchored and engineered can be a world of a difference from another bridge in a different soil situation, even if they look identical. The big trick is that they all look like simple constructs, but they're not.

Forest for the trees... I did not claim that the bridges are actually the same. But how to build foundations, how to span supports, how thick concrete needs to be and how much rebar, these are well established. Yes, there are calculations and designs but civil engineers have done an excellent job of building a large corpus of practical information that allows them to build bridges with confidence. (And this is definitely something we could learn from them.) Rarely are bridges built mostly with custom components that have never been used before.

> Boeing fucked up, not some software engineer taking a short-cut. This was a top down managed disaster with multiple attempts to cover up the root cause and a complete failure of regulatory oversight.

You're trying to hand wave this away as if I am blaming some individual Boeing engineer, but I'm not.

Engineering isn't just coding. Engineering is the planning and the designing and the building and the testing and everything else that makes the product what it is. Boeing created a system to mask the flight characteristics of their new plane, except it didn't actually work. (And also yes they lied to regulators about it.) If it actually worked it those two planes wouldn't have crashed. A product intended to make planes easier to fly is poorly engineered if it actually crashes planes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: