Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Software engineering research is a train wreck (buttondown.email/hillelwayne)
372 points by martincmartin on July 20, 2021 | hide | past | favorite | 166 comments


I worked on engineering productivity research and measurement at Google for two years until about a month ago. (Opinions my own.) Compared to my former colleagues, I'm an infant in this area, so take this with a heaping of salt.

In general, I think the author's cynicism about productivity research is justified, but I think it could have been directed more productively. (NB: the following comments say nothing about areas of software engineering research outside of productivity.)

Commercial software engineering is a creative endeavor; it is not a science, nor is it a manufacturing process. It does not have natural, universal laws. What makes a team 'productive' varies massively based on constraints imposed by business model, product, customer expectations, leadership values, and of course the individuals of which it consists.

I do not believe in the possibility of a "General Theory of Productivity." I'm highly skeptical of attempts to quantify the precise relationship between error discovery stage and cost in a way that is generalizable, although I think it might be possible given a large group of engineers using a highly homogenous process, tools, and accounting. Google is pretty close to this (common dev infrastructure across tens of thousands of engineers), and even across Google this kind of generalization would be extremely difficult.

There is no universal physics of software engineering. As a result, academic research into productivity can be difficult to generalize (which is why, I think, you often see researchers twisting themselves into knots). Instead, my rec is to focus on a few key metrics that are aligned with your business' or team's goals (search for DORA for a good starting set) and to reflect often on what you feel makes your team work well and what doesn't.


>> I do not believe in the possibility of a "General Theory of Productivity."

Important frame. I think any effort that doesn't start with this statement is probably headed for frustration.

I reckon this is true also for education, aspects of economics and many other fields. Besides sloppy and/or cynical work, I believe a lot of the "replication crisis" relates to this. Do we really expect that a relationship between watching TV at dinnertime and marital sex life observed in 1986 Tokyo to replicate in 2021 Berlin? If it doesn't, does it mean that it doesn't exist, or teach us anything?

Scientific academia is sort of premised on the idea that we start from a blank slate, and build up an understanding of a field based strictly in science. It doesn't matter that you already know gravity pulls thing to the ground or that water boils when hot. It must be stated scientifically, tested and theorized.

Well... realistically, from this perspective, we know almost nothing about managing people, teaching, lots of things. Yet, we do manage to do them somehow. People all over the world created domesticated cultivars before Gregor Mendel. Science isn't the only framework for knowing things.

Writers working on their process don't generally go to "empirical research." They read Stephen King, or one of many other authors that speak about their process. They find one that feels compelling. Often, they describe things in terms like "earning the respect of the muse." Usually, it's a poorly supported framework made of a smattering of methodology, random bits of advice, unique terminology and some innovative semantics. Sound familiar?


I like what you've said here. I'm going to think more about it. Some quite important points. There are parts of dealing with people that can't be science based. I would find it hard to disagree with that. We do try to use science in this realm quite a lot though....

...I think this conversation around this item is a good example of how sometimes the surrounding conversation is more valuable than the original topic.


Well written!


cheers.


I seems like many posters in this thread try to classify software enineering as either creative or "mindless factory-work".

Where actual enineering disciplines has the risk of removing the creative part.

I think this classification is wrong. There IS NO mindless factory-work.

Just as in other enineering disciplines, our work is not manufacturing. It's just that the actual manufacturing does not exist (or rather is done by compiler)

Software enineering can (just like other enineering) be:

  - more scientific
  - more pragmatic
  - helped by formal methods
without removing the creativity.

Just as "other" enineering (like software):

  - is highly creative
  - can be artisanal (if wanted, rarely in all projects)
I REALLY feel we can mature in SWE without being afraid of losing creativity.


Agreed; a while back I interviewed of ex-trad, now-software engineers and found out that

    1. Engineering is a lot more personal and creative than we think
    2. A large amount of software development is very similar to trad engineering
    3. Never walk over a bridge.


> It's just that the actual manufacturing does not exist (or rather is done by compiler)

Here’s the kicker though: The part that is done by compilers used to be the bulk of software engineering.

In his Art and Science book Hamming talks about how programmers rejected the idea of even just automated address assignment. They took great pride in manually managing absolute addressing. Only a sissy who doesn’t know real programming would ever use something so silly as symbolic addressing, to say nothing of compiling from assembly. Ugh!

Now we don’t even think about that. Too boring, too solved, too uncreative.

There is a lot of engineering that we currently do, which is completely mechanical, mindless, and can be automated away.


It seems to go something like this every time:

* We create new technology.

* There's an explosion of creativity and applications of the technology.

* This leads to an explosion in the personalities involved.

* Some of these gravitate towards the tedium and expound on it constantly.

* Some person or group automates that away and increases the ability to be creative.

* Repeat from the top.

It's always fun to watch the wheel go around though. Never any shortage of commentary on HN about it!


At the very least, I imagine a true engineering version of software development would be fundamentally driven by Big-O. Everything justified and implemented in terms of calculable facts. Repeatability might emerge from this.

I think of it this way: what would a CRC manual[1] for software engineering comprise?

1. https://en.wikipedia.org/wiki/CRC_Handbook_of_Chemistry_and_...


The problem is that this is only "the very least".


> It's just that the actual manufacturing does not exist (or rather is done by compiler)

In good conditions, yes, the drudgery is all done by the compiler. There's still a terrifying number of cases where something hasn't been automated and still qualifies as factory work.


I notice you consistently spell it "enineering", without the first "g" that everybody else uses. Is there a reason for this?


The poor ontology backing the term software engineer has always seemed to me like the primary culprit for this problem. I've often heard people doing any of the following for their day job referred to as software engineers:

- people who write basic static sties

- people who write wordpress sites

- people who write simple CRUD apps

- people who write hardware drivers

- people who write compilers

- people who write databases

- people who write google-scale distributed systems

- people who write graphics engines

- people who write physics simulation systems

- people who write video games

The complexity, depth of abstractions, and expected maintenance lifetimes of these tasks vary wildly, but because we all write source code we're all counted as doing the same job! Sort of an obvious lunacy, different mistakes will punish each of these jobs differently than others.


Do you think other engineers think this way? Your bridge is only 20 yards over a creek, you're not a civil engineer?

I'm sure the complexity, depth, and expected maintenance of some bridges vary too.


In my field (mechanical), there are engineers, technicians, and drafters. Each of the above require more education than the next, and is paid better than the next. Someone who only does drafting work will not describe their job as engineering, nor are companies likely to ask engineers to do drafting because it doesn't make sense to pay someone an engineer's salary for that kind of work.

Basic static site seems like drafting. A CRUD app or WordPress site or similar between drafting and technician.


In a way, software development is more like writing than it is engineering.

There is a continuum between people who author crud apps and people who author compilers, but you can go from one to the other simply by having industry experience and access to good books.


Wordpress sites is more like painting and writing.

And physics engines and distributed databases and creating Wireguard is software engineering.

I think there are hundreds of times more people doing the former, than the later


Based on civil engineers I know I think it's more likely they'd simply consider the engineer making 20 yard bridges more junior, but doing similar work. As I understand it there aren't the same difference in kind that I'm describing here.


> Do you think other engineers think this way?

Yes, and that's correct. If your skills go as far as building a 1-meter tall wall around a patio you are not an engineer.


This also applies to other fields - medicine for instance.


It's really good point. I usually use this analogy in the context of unit tests. Everyone understands that the review process for a nuclear power generation site shouldn't be same as foot-bridge over a stream on a trail. However, this basic common sense is missing in all these clean code, TDD, unit tests etc. conversation.

I've seen a startup where the front-end is a very simple web-site (just some basic forms, charts and tables). But the team follows all the clean-code, TDD, unit test and whatever latest buzzword you can think of. It takes one front end developer to sort a list alphabetically more than a day, which probably takes a decent developer less than 10 minutes to finish. IMO, the whole clean-code seminars/books industrial complex if responsbile for the amount of time, effort and money wasted following these clean code trends without any real business value.


> do not believe in the possibility of a "General Theory of Productivity." I'm highly skeptical of attempts to quantify the precise relationship between error discovery stage and cost in a way that is generalizable, although I think it might be possible given a large group of engineers using a highly homogenous process, tools, and accounting. Google is pretty close to this (common dev infrastructure across tens of thousands of engineers), and even across Google this kind of generalization would be extremely difficult.

I don't think you are incorrect, but I think a lot of the aspirants behind ESE just want to have a better sense of what works and what doesn't; I'd even welcome negative results! The current state of things is to read 100 opinionated people and their blog posts. And given enough time, you'll encounter someone who swears that after drinking their morning coffee and jumping on one foot for 1 min, they enter a VRChat standup with their team and hit max flow. There's just so little knowledge right now about what works and what doesn't that I'd welcome more clarity, especially negative results.

> As a result, academic research into productivity can be difficult to generalize

I think defects are what we should measure for, not productivity because of the subjectivity of measuring productivity. But even measuring defects is complicated. The best way I see to measure defects is to ask a Team Under Test to document bugs that they encounter along with resolution times, but this is not only expensive, but something I doubt most corporations will be willing to share outside of their walls. Perhaps open source projects can try to store this data, like curl's stats [1].

[1]: https://github.com/curl/stats


> Commercial software engineering is a creative endeavor; it is not a science, nor is it a manufacturing process

Exactly. And it's less like movie production and more like 4,000 people trying to collaborate to produce a million-page novel.

It does bear some resemblance to design and engineering, but with custom materials and components that have never been used before and need to be created specifically for the project.


Generally, analogies to software engineering are not helpful.

Certainly, software engineering may be a little like problem domain A, B and C but probably less so than we think. After all, how acute is our knowledge of these problem domains to begin with?


I mean, considering how much code, especially scripting and pipeline work, is required to produce most modern movies, these days most movies are programming projects in one way or another.


Hm. Every action movie is different, but similar. Every apartment complex is different, but similar. Every webshop is different, but similar.

Yes, usually if you have a bad scene it rarely matters, if you have a lot of amazing ones too. And amazing actors, and editing, and ... Similarly, if you build a nice condo, if one face of it looks bad from the street, but the internal spacing of the units are great, then it's still a success. And if your checkout page is shit, but you have amazing search, good prices, great quality products, then your shop is generally great.

Of course this makes it sound like engineering, or a specific profession (like that of plumbing, HVAC, electrician tradespeople). It's always never an end in itself. It's complex like a bridge or a dam, sure, but without looking at the big picture (traffic, environment, costs, environment, etc..) it cannot be really evaluated. Even safety eval requires assumption (100 year floods, wind loads, min max temperature, max. traffic load, max ship height under the bridge, max electricity load).

And there are patterns, architectures, frameworks. (Like building codes.) There are audits (pentests, like the collision tests for new cars, or synthetic load testing for new sites).

The big difference is that usually movies are done in a few years. Scope change rarely affects bridges. After the basic outline of the dam is checked for basic structural sanity, it's done. After a condo master plan is approved the changes are minimal (because there have been many lives lost due to deviations from the plans).


>> I'm highly skeptical of attempts to quantify the precise relationship between error discovery stage and cost in a way that is generalizable...

I would say universally that bugs found prior to shipping are lower cost (not just cost to fix) than those found after. I've heard from an auto industry friend that over-the-air update capabilities are becoming mandatory for more components. That sounds good because critical fixes can be issued without the cost of a recall (very expensive - if you're a tier 2 supplier you may be out of business). The down side is that the software teams are starting to think they don't actually need to be "done" by launch day, which leads to a bunch of harder to quantify costs.

Another example from a different industry: a bug that caused file import compatibility issues between two versions of the same software. Had that been caught before shipping it would have been no problem to fix. But instead the fix involved trapping an error and rereading the file with a different code path. Also, we didn't realize the change mattered so the file header info was not updated, so we couldn't tell in advance if a file was "old" or "new" when reading it. Once files of both "versions" were in the wild with our customers the simplest (most correct) fix was not possible without braking compatibility.


> The down side is that the software teams are starting to think they don't actually need to be "done" by launch day, which leads to a bunch of harder to quantify costs.

See: the disastrous technical state of many modern video games at launch, seemingly especially those extra chunky "live service" titles that are meant to be around for years instead of just a few months like a normal AAA single player game. Examples:

- https://kotaku.com/how-biowares-anthem-went-wrong-1833731964

- https://www.forbes.com/sites/insertcoin/2018/11/27/bethesdas...

- https://www.ign.com/articles/marvels-avengers-keeps-fixing-t...

No one would have dreamed of shipping a title on the Gamecube or Playstation 2 that had these kinds of problems— whatever it was you put on that day-one disc was going to be the game forever.


> over-the-air update capabilities are becoming mandatory

Looking forward to the mass injection of malware into cars exploiting the usual bugs. How about ransomware to get your car started? Lots of fun!

> the cost of a recall

Mail me a USB stick with the update.


But think of how easy OTA updates will make apprehending criminals!

For instance, if a government needs to institute a lockdown to prevent spread of a novel virus, they can just disable all affected citizens' ability to drive anywhere inessential, and the original software doesn't have to support it


> I would say universally that bugs found prior to shipping are lower cost (not just cost to fix) than those found after.

There's a survivorship bias in this comparison, though. All software contains bugs - some are found in the architectural, coding, testing, review, and end-user stages. Bugs found after shipping are more likely to be significant to an actual user, because it's likely they were noticed in the course of someone using the software a particular way.

However, it's entirely possible that there are bugs that never need to be fixed. Perhaps these bugs involve scenarios that never crop up, or are part of a product direction that gets abandoned completely. (Example: Quibi probably didn't need to handle credit card expiry gracefully, that would have been wasted effort).

Those bugs that made it into prod and got prioritized for a fix, and were memorable are likely to be higher effort just by the nature of those filters.


> Commercial software engineering is a creative endeavor; it is not a science,

And do you think Science it is not a creative endeavor? I know that you are coming from the "art/science dichotomy" where the terms are used metaphorically, to me it is an useless distinction still.

> nor is it a manufacturing process?

Why not? Software production IS a manufacturing process.


> Commercial software engineering is a creative endeavor; it is not a science, nor is it a manufacturing process.

Do you think software engineering will remain a creative endeavour in the future?

We've only been programming in high level languages for 60-70 years. All the while Moore's law has been in effect.


I think doing the engineering design of structures and machines (buildings, bridges, engines, etc.) is still a creative endeavor after many hundreds of years, and I think this is the closest analogue to what software engineers do. The difference is that once we have the blueprints drawn (ie. the code written), a computer can just execute them rather than requiring a lot more material and labor inputs. Program execution is the right analogue to manufacturing, not program creation.


I agree with this, but those things have also moved from essentially being artisanal to being more formalized and scientific. I don't think we're at that point in software engineering.


We're definitely earlier in the timeline with software engineering, but I personally think it seems like structural and mechanical engineering designs are a lot more "artisanal" than is the general conception. Certainly the primitives involved are very well understand scientifically (and this is where we're definitely a long way behind them), but the specific combinations of those primitives strike me as remaining very bespoke and creative, as does the process for creating the designs. All of this looks familiar to me, just further down the maturity timeline.


Oof. I have no idea. But since it'll be fun to guess:

Absent some kind of magical (from today's POV) AGI, I think I'd probably say yes, but I'm not sure that means software engineering will continue to look the way it does today, at least not universally.

My expectation is a pretty linear extrapolation of history. I think new tools and higher levels of abstraction will emerge that will make certain types of tasks unnecessary or a lot faster. We'll still need people working on compilers and embedded software, but a lot of really basic forms of software development (think Excel replacements) will be A LOT faster and easier to do.

Fundamentally, though, business problems are really fucking specific, and you need a really fucking specific language to express and solve those problems. The process of applying those languages to problems is a creative one. I don't think this dynamic will change for a really long time, if ever.


That's what tools like Visual Basic and Delphi were meant to do, and actually were pretty good at for their time. As far as I can tell, we still haven't nearly caught up to the browser/cloud equivalent of these tools.

Things need to sit still for a while longer I guess, but I think there's also some industry changes to blame. In the 90s there was big money in making tools and components for software engineers, while nowadays everyone just uses the best free thing they can find. Granted the free things are way better than they were in the 90s, but I think they're largely killing the market for better things that cost money.


Look at low code tools. There's money, market demand, and many tool vendors.


“Do you think software engineering will remain a creative endeavour in the future?”

I would think so. If something can be automated or be done with a reproducible process it will get automated or performed by lower paid people. But I can easily see how software and automated processes will reach a point of “good enough” for most purposes sometime in the future and there will be much less software engineers doing creative work. For example I would expect a lot of repetitive front end backend work will go away.


I would expect the opposite. As the mechanical stuff becomes cheaper and more automated, the creative work decreases in proportion but increases in absolute terms.


My two cents: is that really the problem? I don't think so.

I mean that the non-creative part is the problem. Why would anybody want to eliminate the creative part and leave the mindless factory-like part?

And there is a huge non-creative, tedious work in programming. A lot of workarounds, minutiae, leaky abstractions, kludges and plain simple idiocy that doesn't get removed because "it's too much work".


Music has remained pretty much a creative thing through most exponential explosions of technology that affected it profoundly. What's the most crazy futuristic scenario where software engineering stops being a creative field?

We have programming languages for creating and evolving organic lifeforms? You are basically creating plants, pets, "superworkers" or "super soldiers", or artists? That's going to still be creative. Like raising a child is creative. Same thing for stuff that can have an ego driven consciousness like most forms of AGI.


I can prove that languages may differ in productivity (without regards to other variables like the ones you mention) with a simple "proof by extremes": No one would likely dispute the fact that it will be more productive overall to code in Javascript than in Brainfuck (although perhaps not by much).


I was reading this thesis the other day[0], which is on precision machine design. It got me comparing precision machine design to software engineering.

A big part of why we're able to design extremely precise machines (the author worked on a lathe used for machining optical parts for the National Ignition Facility) is because we can characterize exactly where errors will come from (e.g. structure isn't rigid enough, temperature variation causes parts to expand by different amounts, parts not round or flat enough, etc.). Once we know what errors we need to control and their rough order of importance we can start improving the machine design to control them better.

In theory, something similar could be done in software engineering (formal methods are part of this, but not a full solution). Rather than an error budget, you'd have some sort of bug budget, where you tracked what sort of bugs were caused by what sort of constructs, and design your program in such a way to minimize their chance of being introduced. I've never heard of anyone except Dan Bernstein[1] actually doing anything approximating this. Probably because the perceived level of effort is too high.

I actually don't think it would take that much effort, but it would require quite a bit of organization to track where bugs are introduced and what their root causes are. This is probably why Bernstein, an individual, is able to do this, while no large team (that I'm aware of) has done anything similar.

Of course, just like your toaster doesn't need to use precision machine design techniques (an over-engineered toaster is a bad toaster), most software doesn't need the effort/expense of a rigorous design process either, but some would benefit from it.

[0]: https://dspace.mit.edu/handle/1721.1/9414

[1]: https://cr.yp.to/djb.html


> while no large team (that I'm aware of) has done anything similar.

In areas like avionics, medical devices etc., there are formal production engineering methods applied to software as a matter of course. It can very work well, but it is definitely expensive compared to industry average.


I have my suspicion that medical devices aren't as rigidly programmed as the general public would like. Ie programmed by a biomedical/electrical engineer who isn't really a software specialist. But I'd be keen to hear from someone in the trenches.


The industry changes slowly, and definitely regulators like the FDA were too hardware focused for a long time. Less so in the last 10-20 years.

They've always been pretty good about critical systems but less so about "secondary" ones, and there are old devices out there where it shows.

These days most of the industry has reasonable guidance and standards to follow. Characterizing "the industry" is hard; it includes cloud based data handling products and embedded devices with minimal software, etc. - a huge range.

In both these areas (and others,to be sure), there is a formal engineering process that looks quite similar - and you need to embrace it in your software development or there will be a lot of friction. The goals of it are pretty sensible, but the approaches may seem odd to some with more mainstream software backgrounds.


It's a different version of rigorous to what a normal software industry interpretation would be.

It's about controlled processes, highly auditable, very sharply optimised focused managing the medical (and medico-legal) risk at the expense of every other type of problem.

Generic software specialists I think actually aren't very good at it and take a while to acclimatise to it.


I just finished Donald Knuth's "Literate programming" book and in there is one of the most wonderful papers I've read: "The errors of TeX".

The version in the book goes up to 1991 but on ctan you should be able to find "errorlog.tex" if you want to read the updated (I think the last one is from this year) paper.

Knuth has documented every bug and feature fix in TeX since its first version and categorised it into categories A-T.


The FoundationDB team took a different approach, that I found pretty interesting. They created a fully deterministic (single threaded) simulation of a distributed database, and then use that simulation to test their implementation of the database under difference scenarios. They'd probably be interested in something like what you describe, as the bulk of their work seems to be rooting out and squashing bugs caused by factors out of their control (dropped connections, etc.)

https://www.youtube.com/watch?v=4fFDFbi3toc


> Rather than an error budget, you'd have some sort of bug budget, where you tracked what sort of bugs were caused by what sort of constructs, and design your program in such a way to minimize their chance of being introduced.

Aren't we more or less doing that at the language level? C had lots of memory bugs, so C++ tried to fix it with collections, iterators, and smart pointers. It was still hard, so Java came along with GC and null ref exceptions. The situation was less bad, but multithreading became popular and Java didn't help as much as we would like so the Rust people gave it a go.

Though I really do suggest trying it yourself. If you want to be a better programmer, developing a style that minimizes bugs you write is pretty much pure win.


This would make sense if we migrated existing software projects to different languages.


Software errors which are traceable usually get dealt with with ease. The remaining majority are unknown and governed by power law - eg. a bug uncovers a major redesign, but there is no time for rewrites so we have to roll along with it, complicating the design until next big rewrite if it ever happens. Power law phenomena are quantitatively untraceable. Averages mean nothing.

Ergo software engineering is largely untraceable and better treated as a discovery or research process. To make things efficient, focus on removing impediments and continual simplification, but don't expect to predict much in advance. Unless it is an instance of a previous task, in which case why isn't there a function/library/framework for it yet?


This sort of thinking precludes the possibility of ever developing better methods though. And this statistical argument was actually quite common in machine design until people started actually building machines that were more precise than many believed possible. The thesis I linked has examples of arguments analogous to what you're saying for why precision machine design is impossible.

Edit: I don't mean to be too harsh, you could be right, but software engineering is a young discipline. Mechanical engineering is much older and it was only recently that precision machine design became possible (and, just as importantly, teachable). I believe software engineering has the potential for a similar change in how it is done, at least for projects that warrant it.


It's also difficult because software despite being a realm of abstract constructs still exists in the physical world. Hardware corruption can introduce bugs despite the software being formally correct. It's fascinating learning about hardware manufacturing for satellites and needing to build in hardware redundancy and error checking for bits getting flipped by solar radiation. And of course there was the famous rowhammer attack.

So you need not just software expertise but also hardware expertise if you're going to get serious about writing error-proof code, which Dan Bernstein happens to have.


> "where you tracked what sort of bugs were caused by what sort of constructs, and design your program in such a way to minimize their chance of being introduced. I've never heard of anyone except Dan Bernstein actually doing anything approximating this"

Not sure if I completely understand but here[1] is a talk by Jules May called "If considered harmful: How to eradicate 95% of all your bugs in one simple step" where he discusses that they analyzed what was causing a lot of bugs in their programs, and found it was code which keeps state synchronised using `if` in far distant places in the codebase, many places with "if (stateFlag) {}" which then gradually fell out of sync with each other as people changed some but not all of them.

By redesigning the programs so that instead of call points dotted around the codebase all saying `if (condition) { function1() } else { function2() }` they all say `obj.doWork()` any the condition checking code only exists in one place inside the class definition, the talk title claims they removed 95% of bugs in their codebase. Inside the talk he spends all the time explaining the idea and doesn't actually talk about the effect it had on their bug tracker database, bugs encountered, etc. which is a bit of a shame.

Of course, the comments are suitably sneery for a programming talk: "having double checking being your main source of bugs means that you really try your best to write bad code. The whole talk boils down to: double checking conditions is bad and we found it 20 years after everybody else." - even if you agree with that, the analysis of their bug tracker to find what was causing most bugs and redesigning how they code based on that, appears to be an example of what you describe.

[1] https://www.youtube.com/watch?v=z43bmaMwagI


I happened to also delved into machining tools and work as SWE full time.

This comparison to me, is not workable between machine building and software.

So the difference is that machines do simple things and not changing. But software do complicated things and always changing.

You can easily understand which parts and/or ways of operating that cause machine misbehaving. Because you have full grasp of the parts and how they assembled and work

But you cannot achieve the same for software, as each time it will be a new problem, or a new feature not working as expected.

Actually, if you think a bit more, you might realize that we actually cares very little about the microscopic features of machines. For example, no one cares about screws, as long as they are machined according to spec, and torqued correctly during installation. The chance that a random screw to bring down a whole machine is close to zero.

But any function in a software can bring down the whole software...


> For example, no one cares about screws, as long as they are machined according to spec, and torqued correctly during installation.

This just isn't true. Off the top list:

- are they the right spec for the load

- are they the correct material and finish for environment

- do they have the right head (for manufacture, but also support, etc.)

- what vendor options do we have (the more unusual, the more this gets interesting) and what risk

- are the manufacturing SOPs correct for this

- are the manufacturing SOPs being followed

- etc.

And that's the simple things like screws. You may not care about any of this stuff until it fails; but if nobody cares, things will fail.

The real difference between stuff like this an software engineering is that in far more cases for hardware an engineer can look up the correct solution and use it. You still have to care enough to check that, and check it against any change, etc.


> For example, no one cares about screws, as long as they are machined according to spec, and torqued correctly during installation. The chance that a random screw to bring down a whole machine is close to zero.

This sounds like people do care about screws. You identify two specific failure modes: they could be machined incorrectly, or they could be torqued incorrectly during installation. In either case, it is entirely possible that a single screw could cause the whole machine to fall apart (depending on which screw fails).

Software is very similar. The smaller the function, the fewer things that could possibly go wrong (similar to a screw) but there's always the possibility that something goes wrong. Like screws, some functions won't do much harm if they're wrong, but some are keystones that, if they fail, will bring the whole thing down.


Depending on your application, you learn to care very, very deeply about screws and bolts. Screw and bolt failures kill people, and they usually fail in conjunction with other design mistakes. Like software, it's not just one thing and you do need to sweat the details.


Yep. I think we have a tendency to assume that our field is somehow unique, because we know it so intimately. "Everyone else has it simpler."

The actual fact is that everyone else has it just as complicated, but we never see that complexity because it's not what we are experts in.


Have you heard about jesus nuts? [0]

[0] https://en.wikipedia.org/wiki/Jesus_nut


> as long as they are machined according to spec, and torqued correctly during installation.

Isn't it more correct to say that nobody cares about screws because we have worked out the problems with screws to the point that we can trust they are machined to spec and torqued correctly?


One of my favorite online discoveries is Fastenal labs, where they talk about properties of screws. Here's one that says your machine can break because you put a stainless steel screw through an aluminum plate: https://www.fastenal.com/en/70/corrosion


A random screw can have a huge difference. Just pull up McMaster-Carr and look up the hundreds of different types of fasteners that all have different properties that make them suitable in some cases or others. Someone has to choose the screw, and it's not necessarily a straightforward choice.


> It got me comparing precision machine design to software engineering.

I think the problem is that the failure of physical systems are often caused by random failures, which can be tested for and probabilities determined e.g how long before a bolt breaks. This as opposed to software, which most often fail due to systematic errors and cannot be tested completely.


In precision machine design the failures are typically not statistical (like in fatigue failures) but due to errors of design. E.g. the tool spindle can maintain the desired tolerance when running from a cold start but after several minutes thermal effects cause it to be out of tolerance.


Yeah ok I did not know that, but compared to software it’s still practical to test the design for all known inputs.


I went into this software engineering productivity research rabbit hole a decade ago and came away similarly depressed with the lack of rigor in our field.

What I’ve come to believe:

- Software engineering is not an engineering discipline, it is a craft. It may or may not one day become an engineering discipline, I find this impossible to tell.

- The difference between a feature and a bug is largely one of semantics, not of substance. A defect is a mismatch between expected behavior and actual behavior. A bug is when the expectation was that of the programmer that wrote the code. A feature request is when the expectation was that of a user. There is no real difference for the user between a missing feature and a broken feature. Hence why we so often get side-tracked into fruitless “bug or feature” discussions and why users don’t care when we say it is by design.

- Errors will happen. People are flawed and produce errors at a more or less set average rate for each person (unless they’re tired or enjoying a ballmer peak). Errors in every step of the process, from design to deployment, are inevitable. We can however catch these inevitable errors, with automated error detection tools, or by having a fresh set of eyes look at the work. The consequence is that making high quality software necessitates taking a look at the whole process and introducing error detection steps in every part of it, whether they be automated like linters and unit tests, or human like design and code review or mentoring and pair programming.

- Errors in design are more costly because they cause entire features to need to be written or rewritten, which is the most costly kind of defect to resolve. We do not need research to know this to be true. We solve this by having short feedback loops with real users, which is the only real form of agile development.


> Software engineering is not an engineering discipline, it is a craft

Software engineering is *still not treated* as an engineering discipline, it is *still treated as* a craft


"Here’s the only technique I’ve found that works, which I call scrobbling even though that means something totally different:

"1. Search seed terms you know, like “cost of bugs”, in an appropriate journal (here’s one).

"2. Find papers that look kinda relevant, skim their abstracts and conclusions.

"3. Make a list of all the papers that either cite or are cited by these papers and repeat.

"4. Find more useful terms and repeat."

Congratulations! You're a grad student now!

P.S. When I graduated, I had a filing cabinet with two drawers full of papers that felt important enough to save printed copies, from just this process. A couple of months ago, I carted them out to the burn pile, threw them on the pile of leaves and brush, and held a Viking funeral. And I didn't even feel bad. :-)


I got into software specifically because I didn't want to go to physics grad school, and now I read more papers than I ever did in physics. Man plans, God laughs.


The problem stated is that instead of being able to find widely cited or good papers, any amount of research involves trudging through hundreds of bad, off topic, outdated, unrelated, or otherwise useless papers, just to get an idea of what is being talked about in the field. And even when getting there, what you find may be completely non-practically relevant in the end.

Just saying "yeah thats how things work in academia" Doesn't help anything; this article is pointing out a very real problem: the state of things now is a mess.


My impression on reading this list was, uh yeah, there's no avoiding that. Except if you know an expert. Then just ask an expert.

Search engines for papers suck, bad. At the fringes of human knowledge, you need an actual brain to process cutting-edge research. Either your own non-expert brain, or an expert brain to help guide you.

That process is the same in every field. Older fields have textbooks with knowledge distilled from experts. That's a good place to start to get the state of the art > 20 years ago.


| Search engines for papers suck, bad.

This screams the need for an intelligent search engine, with summarizing features that point out relevant parts of the paper, and (imho) a UI and algorithms that reach across disciplines and encourage discovery. So, for instance (I'm making this up), someone looking for a way to understand interferometry data might stumble on a useful regularization technique from image processing.

I imagine academic libraries would pay really good money for that. Of course, it would help if a mostly complete body of scientific literature was available to crawl--like SciHub.


Wouldn't be some graph-traversing be enough? You find an important paper and then just search for papers which cite this one? And then sort by how many cite these (similar to the pagerank algorithm) and filter them by year and keyword?

There aren't that many different citing stiles, so I'm pretty sure it would be possible.


It is usually possible to do this (a digital journal can include links to cited/citing papers), but citations happen for a variety of reasons (background information, a single statistic, boosting a colleague's work...). Every citation applies to a different sentence or paragraph in an article. To understand the content and whether it is relevant, we still need to digest the text.

There are also less-cited papers and journals that can be just as relevant--every article starts with zero citations, and the vast majority of good work out there isn't exciting enough to be accepted for Nature.

Further, the process is rather incestuous: Previous person solved X problem with Y approach because that's the first thing someone came up with and it's just how it's done in our field, so let's continue doing this inefficient thing and citing that paper, maybe working on a better way to do things. Meanwhile in some other field, a mathematician or whatever came up with a far better solution to a similar problem long ago, but nobody in this field ever knew that, so it goes unnoticed.

I do want to see what is highly cited because it's probably interesting, but I also want to see things that don't get that kind of attention but are applicable to my work, and things that I wouldn't know are applicable to my work.

Another example: Currently, authors enter relevant keywords when they submit a paper. Maybe that works when someone searches the right combination of words, or maybe it doesn't because the search engines suck. Or because in one part of the world the topic has a completely different vocabulary, and we miss a whole library's worth of useful papers.

I speak from a background in STEM. I can't vouch for other disciplines, but I imagine they have similar pain points. Heck, I don't even know what others in STEM think, other than "that's just how research works."

I don't think it's an easy problem, or we wouldn't still be sifting through mounds of crap to find a few relevant, reproducible works worth reading and citing. I think it would involve figuring out the overarching themes and important methods in a paper and sorting them by their importance, and a little bit of fuzziness to say "hey, this isn't exactly what you're looking for, but it sure seems useful." This could even allow un-cited works a second chance.

I think I'm describing two separate goals, and the fuzzy part could wait until the relevance problem is addressed. I don't think the problems are particularly easy, or Google Scholar would have solved the problem and monetized the solution already. But it seems like they're solvable problems.


Semantic Scholar exists, and does a decent job mostly, in my opinion.


Heh. I entered a topic of interest and found my own minimally cited work at the top of the list.

I approve of this tool! It's a shame I hadn't heard of it before.


If you're really in a hurry, filter out all those not published at top venues (e.g. ICSE or FSE as top tier for SW engineering, or ASE/ISSTA the tier below), or at the very least cited by several papers at top venues. In the vast majority of cases everything else is likely to be noise, incremental, or preliminary work. Of course consider the above venues as (perhaps) necessary but not necessarily sufficient condition for quality. You have to realise that academia is hyper competitive. The best work tends to get published at top venues because they are very difficult to get published at, and offer hiring and promotion committees an easy filter. Use that in your favour to filter out much of the dross and/or poorly executed ideas.


Yeah I was going to say, this sounds like a normal way of getting around the literature. We were taught to start mostly with review/summary papers from high-quality places, and work our way out from there.


Now I know why my professor explained breadth first search for an entire week!!


> I’ve checked a few other papers and think I’m tentatively confident in this line of reasoning: certain bugs take more time to fix (and cause more damage) than others, and said bugs tend to be issues in the design. I haven’t vetted these papers to see if they don’t make major mistakes, though. It’s more time-consuming to do that when you’re trying to synthesize an stance based on lots of indirect pieces of evidence. You’re using a lot more papers but doing a lot less with each one, so the cost gets higher.

Empiricism and quantitative metrics have indispensable value, that much should be clear to everybody I hope. But too often people forget (or have active contempt for) the value of qualitative metrics which can only be judged subjectively. Such considerations are naturally harder to deal with than cold hard data, so it doesn't surprise me that people want to avoid it. But when you blind yourself to the qualitative and subjective, you do yourself a huge disservice. Just ask McNamara; he thought he could win the Vietnam War with quantitative metrics and utterly neglected difficult to quantify metrics like public sentiment, both in Vietnam and America. I see echos of this in our industry today; we love to talk about empirical measures like the number of bugs, but subjective metrics, like the severity of those bugs, receive less attention.

Many university programs are set up to address this, by making engineering students earn credits in the humanities as well. But I fear the value of this is often inadequately explained. Contempt for the humanities and scientism go hand-in-hand, and are a worrying trend particularly in the tech industry.


> But too often people forget [...] the value of qualitative metrics which can only be judged subjectively.

When I read the rant, I remembered a story about my university days and your sentence prompted my to write it down.

10 years ago I had to write a meta paper about Test Driven Development (TDD). I was researching this topic and found studies and other meta papers.

Some of the studies were from the likes of Microsoft where they explained they wrote two different drivers with comparable lines of code - one with TDD and one without - and tracked how many bugs they found after version 1.0 and how fast the driver projects were delivered. So in these non-trivial multi-month multi-person projects they claimed TDD was extremely useful to reduce bugs.

Other papers had 20 students learn how to program with TDD and another 20 students without and tried to find differences there. They couldn't find too many differences in the groups doing something like a 2 hour project. Since the group who just learned TDD was a little bit slower, they concluded neutral or negative value of TDD.

When I looked into the meta papers comparing these, I found some that were judging both of these papers as being of equivalent value, which to this day I'm really wondering about.


Testing different methods of development in terms of speed, cost and quality is really hard. The most convincing approach to me would be a single blind experiment to hire two software development teams and have them build to the same set of requirements in two different ways. But then it is hard to know whether you are really comparing the method of software development or the quality of the software teams. So two software teams isn't enough to get a statistically valid inference. You can see that, given software development rates, this could become a very expensive experiment.

Last point. I think that even writing a specification down to the level that it could be implemented using formal methods might be the biggest game changer. Agile stories rarely come even close to covering all of the potential edge cases. If we had a process that required product owners to literally think through all possible failure modes (what systems of formal methods do) and write out how to handle them then the cost of writing specifications would go way up. Per economics, I think we would end-up with simpler specifications which might be its own benefit.


It’s not that hard is it? There recently was a study on the value of testing and best practices linked here on HN, that I of course can’t find now, where the researchers looked a thousands of projects. Over all there was no scientific proof that testing and best practices lead to better results than just making spaghetti without a recipe.

Having worked in an public enterprise organisation that buys a lot of different software for some decades, it sort of fits with our completely anecdotal data. We still prefer suppliers that have all the nice buzzwords, but if you look at our projects there is just no correlation between their methods and how the software project goes through its livecycle. And this is with everything from the old COBOL systems to modern micro service this and that cloud solutions.

For the past five years I’ve had a near little sidejob as an external censor for CS students, and it’s been interesting to follow how their software design metrology changes rather rapidly, without any real scientific reason as to why that is. Mostly it seems like there is an entire field of “education” dedicated to getting people to do software design and project management in the way that sells the most licenses to Atlassian or whatever else, or simply the most books. It’s really very comparable to the self-health industry, where you’ll have answers for everything.

Sure it’s mostly anecdotal, but preaching that test driven development or going Agile SOLID wild turkey is the holy grail is exactly the same as preaching some diet where you get to eat as much you want as long as it isn’t carbs. Sure you can lose weight, but it’s not like it’s the only way to lose weight and next year it’ll be about going on a juice cleansing or something and then something different after that.


A study design like that is called an epidemiological study and is far behind the gold standard of a random controlled trial, reason being the teams that choose to do testing or not are not randomly assigned to experimental and control groups. There are ways outside of study design of controlling for confounders when you can't randomly assign experimental and control groups, such as in this instance only looking at teams that directly tried similar projects with tests and without, but it is rare to see anyone do that.

Otherwise, you hit a rather obvious issue. Testing and following best practices are not the only policies impacting project quality, and in particular they exist in large part to help less experienced or hastily assembled teams. If you're comparing their output to the output of several core maintainers who have been working on the same project for 20 years, in the absence of other information, you expect the latter group to produce better quality work, and the fact that they actually do even if they aren't following industry best practices doesn't tell you those practices aren't useful to the former group or even that the latter group couldn't have produced an even better product if they'd followed them.

Be aware I'm not at all trying to advocate for either approach, just the issues with various flavors of scientific evidence that vary tremendously in how valuable they are depending on study design. I'm just saying we can't know with any level of scientific validity because the studies themselves are near worthless. Software management is in the state today that major league baseball was in 30 years ago, no statistically valid evidence and a whole lot of gut eye test from grizzled veterans. But unlike with baseball, nobody is keeping rich troves of every imaginable counting stat that can be counted going back a century on all of the developers, so a pure data science approach to making management more scientific like the moneyball guys accomplished in pro sports is not likely to work, since it would necessarily be data science without the data.


> Over all there was no scientific proof that testing and best practices lead to better results than just making spaghetti without a recipe.

Which best practices? I can’t see how a team without certain practices could be that effective. Eg, version control, having good backups, a good communication culture, code review, config management to prevent “it works in my machine” problems, and many other things.

For code review specifically, I read a paper years ago claiming a dramatic decrease in defect counts in companies who practice this


"A specification that can be implemented using formal methods". That is just source code. If the specification can completely define arbitrary programs it is necessarily Turing complete on its own, and as such prone to the same type of bugs as any other program.


Purely from an industrial perspective, interest in formal methods tends to split two ways:

1. Verifying really nasty algorithms, the kind you see in cryptography and embedded systems and stuff where the bugs are triggered by horrific race conditions or incredibly specific malicious inputs that even experts won't think of testing

2. High level specifications of requirements and abstract machines and stuff, where the spec is like 100 lines and the implementation is 10k and you'd prefer to catch some design bugs before you're ten sprints into coding

A lot of bugs in (1) end up being memory-related, which is why you're seeing languages with borrow checking as part of the semantics (Rust). A lot of hype these days is in (2) because it's a lot cheaper and easier to learn, at the cost of having a lower power ceiling.


What is rust most commonly used for? If Somebody learned to program with it, what sorts of projects would they work on?


I think you are speaking to one of the core tensions in formal methods. The difference between a specification and an implementation can get blurry. Where formal methods get interesting is statically proving properties about the specification. Take a simple example of a sorting algorithm. The two most commonly proved properties of these algorithms are that they 1) return a permutation of the input list (no items removed or duplicated) and 2) that the output of the list follow some sort of ordering.

One way to look at things is to say the permutation and ordering property checkers are the specification and the actual sorting algorithm is the implementation.

To your point about the specifications being Turing complete, some tools will put restrictions on the specifications to make function termination highly likely. COQ for instance requires that recursive functions be "decreasing in their inputs" AKA that subsequent calls to the same function are passed fewer items or elements than the parent.


Sorting is one of the more favorable tasks for being specified this way, for much code there is no simpler way to verify the output than running the same or equivalent code again.

If your specification language is not Turing complete then there is simply stuff you cannot specify. Of course, just because it isn't Turing complete doesn't mean it isn't perfectly adequate for writing bugs.


As soon as you need to interface with arbitrary external components, you see the value in good specifications. If JPEG-2000 was just a reference implementation and not a spec, that would work fine if only one team ever had to develop a JPEG library, and every application that read or wrote JPEG files used exactly that one library.

Since that isn't the case, having something sit at a higher level of abstraction than the actual source code is quite valuable. Additionally, it allows domain experts like image scientists and physicists, who are experts in how to compress and decompress data with minimal quality loss but may not be experts in any particular programming language, to still contribute to the spec.


I understand your last point, but my question is:

In order for domain experts to contribute to a formal specification, this specification must be, well, formal, and also serve as a very precise shared language among all domain experts and implementors (i.e. the people who are going to read the spec and build something out of it). Once you go down this road, the specification language becomes as complex as any programming language -- or maybe even more! -- and must be learned by all involved, just like any given programming language. Some people will find it easier to learn, some will struggle or find it bizarrely unfamiliar -- again, just like any given programming language. Any sufficiently expressive specification language will also be subject to the kinds of bugs and complexity that affect programming languages.

So my question is: isn't learning a shared formal specification language more or less as difficult as learning an unfamiliar programming language?


Is there a machine-readable spec for JPEGs? Did it find interesting bugs and oversights?


> The most convincing approach to me would be a single blind experiment

http://www.plat-forms.org/


Requirements analysis is an integral part of the development process. Giving teams fully specified requirements at the beginning of the experiment wouldn't be realistic.


Depends. You could then run experiments on team performance in the requirements-to-code phase, separate from the generating-requirements phase. That has its place. And then you could experiment with teams trying to convert informal requirements to formal requirements. That might let you learn some things about the parts that you couldn't learn if you dealt with the whole.


Those are not separate phases, they take place simultaneously. No one does waterfall development anymore.


Waterfall development is still pervasive in a lot of industries and companies.


Definitely still out there.


My two cents: Software is a form of literacy not engineering. One good bridge looks and acts almost exactly like every other bridge in the world. Good software that does a given job can be unrecognisable compared to other good software that does exactly the same job.

Software is more like a novel than a bridge.

Maybe, maybe we can apply engineering terms to machine code or assembly. But the abstract levels we all work at - not a hope.

I mean this is a good thing. It changes how we think about software - engineering has changed the world - and it's in the hands of a few professionals. Imagine software in the hands of ... everyone. It's like Eric Idle gather mud in the Dark Ages imagining what a literate world that reads the New york times will be like


> My two cents: Software is a form of literacy not engineering. One good bridge looks and acts almost exactly like every other bridge in the world. Good software that does a given job can be unrecognisable compared to other good software that does exactly the same job.

1. This is false. One of the reasons making bridges is so expensive is because each bridge is a precious special snowflake with lots of challenges specific to that bridge project and no other bridges.

2. Bridge building is a very small part of civil engineering, and civil engineering is a very small part of traditional engineering. Maybe software is more like process engineering, or chipset design, or subsystem integration, or workflow optimization.


1. I agree bridges have special challenges, but ... well for 2,000 years there was a keystone, and an arch. Yes the soil and the land were challenging, but I find it hard to argue that every stone bridge was a special snowflake. Snap a supporting cable on any suspension bridge and they all have the same failure mode. I think we are stretching analogies - is "build a CRM for the sales department" simple as in "there is a keystone data design common across all CRMS - and all the rest is "shifting sands of the river are same as shifting alliances in the boardroom"

I just try to keep this simple. Coding uses symbolic representation to describe a model - this is what language is. Compilers do / can do things that simply don't exist conceptually in Engineering.

Coder manipulate symbols. It really is a language thing.

2. yes. I was just using bridges as an example of engineering. As CRUD websites are an example of software engineering.


> I just try to keep this simple. Coding uses symbolic representation to describe a model - this is what language is. Compilers do / can do things that simply don't exist conceptually in Engineering.

> Coder manipulate symbols. It really is a language thing.

What is conceptually different here between software engineering and civil or mechanical engineering?

Engineers use symbolic representations (math, diagrams, computer models, etc.) to describe a model of what will eventually be manufactured or built in the real world. They're typically not the people who actually build the thing, they're the ones who describe it in sufficient detail that someone else can build it. This is as much a language problem as it is in software. The designs need to be readable by the entire team of engineers as well as the people who will make the design real.

The primary difference I see is that in software engineering, the "someone else" who will build the thing is a computer program called a compiler rather than a factory worker or a construction crew.


I tend to agree with you as well. I've also come to the conclusion that conceiving of software as some new 'kind' of non-linear writing is more productive than understanding it through the traditional engineering discipline.

But it is both and it is neither. Software is a new thing that we have yet to fully understand.


But the risk is that bad software or a bad bridge can pose as good software or a good bridge until something catastrophic happens.

Software might be more like a novel if you are consuming the code/design, but to write a good novel one must be far more than just literate.


Wow, thanks for the link to sci-hub.st - this is awesome! I can finally access Elsevier's "walled" content again.



Oh hey, I made that, but it was taken down by what was then called Zeit. Looks like that didn't persist through their Vercel rename.

Note that it just fetches the current URL from Wikidata.


> The average developer thinks empirical software engineering is a waste of time. How can you possibly study something as complex as software engineering?! You’ve got different languages and projects and teams and experience levels and problem domains and constraints and timelines and everything else. Why should they believe your giant incoherent mess of sadness over their personal experience or their favorite speaker’s logical arguments?

This argument could be used for literally any science but, for some reason, it seems to fall on attentive ears mostly in the software industry.


I think it's common in a lot of fields. It's def prevalent in medicine, for example: people are skeptical of any research on doctor behavior.


I think the issue is that management is not done scientifically. You can see this with something as estimating for a project. Higher ups just ask “how many hours and when can you get it done” - they don’t want you to go through past similar work review it’s timeliness and hours, divide it into smaller tasks, build a project schedule, review productivity trends and resource loadings, or any scientific way of arriving at the number. What they really want is a seat of the pants WAG, which isn’t too big to scare the client, that they can use to get going, with the understanding that shit happens along the way.


I generally agree with everything, but thought I would just use this as a springboard into a related topic:

> What’s the difference between a “bug” and “defect”? Well, one’s a bug and the other’s a defect, duh!

This kind of issue is common but I'm not sure how to avoid it. Any group of people > 1 will start to use their own lingo, which often is made up of similar words from "outside" but have different connotations. This is true in science, medicine, software engineering, law, everywhere.

(I mean, why would I search for "bug"? Like, I'm searching for problems with computer code, not insects!)

This phenomenon unfortunately leads to misunderstandings with the general public, which leads to mistrust. Part of it is on scientists (and lawyers...) to be clearer in their communication, but I think it is also on the public to recognize that when reading scientific literature they are not the intended audience and are therefore missing a ton of context that is not explicitly stated.

Also:

>I’m sure this is slightly easier if you’re deeply embedded in academia

Also depends on the field. Chemistry has SciFinder, which although very expensive for institutions, is very good. It is fairly specific to chemistry though (and some overlapping fields).

His process for finding node papers and grinding through citations is pretty much how most scientists do it, though. And conferences.


> ...“bugs found in requirements are 100x cheaper than bugs found in implementations.” ... There’s one tiny problem with the IBM Systems Sciences Institute study: it doesn’t exist

It's funny to think how much of an effect this chart had on software engineering as a whole. I remember learning it at university and until today I thought it had some basis in science.


It "feels right" and is therefore "truthy."


Do you guys doubt that it is right? What kind of information would you require to show you that it is true?

I'm one of the lucky ones who tends to work on safety critical "complex cyber-physical systems". Is that maybe the difference?

There is the case where a developer runs unit tests before a commit. Some of the tests are unhappy. Developer fixes it and commits it up. Maybe hours spent?

Then there is the case where something goes funky in the real-world all-up integrated system at the test range. Even in the best case tens of people waste a day. If we are unlucky it is a hard one and a small army of the most senior developers with the best operators and hardware people are hunting the bug for weeks. I fear to even sum op the wasted cumulative work days.

I lived through multiple of these at different companies on different projects. (Both the first kind and the second kind.)

Is the question if this is true? Or maybe the question is if this is true for your area of the industry too?

Obviously I won't be able to provide sources for the ones I worked on. I bet no company really would want to release the hard data on these things. So let's look at some well-publicised problem caught in production: The Boeing 737 MAX MCAS issue. 346 deaths, 1 year, 8 months and 5 days grounding and the estimated direct costs are US$20 billion.

Obviously this is all super anecdotal. I just want to understand what part of the question/problem you have doubts about.


I think that’s the point - the chart confirm your bias and everyone else’s so we accept the results at face value. Nobody really knows if those bar heights were ever measured but of the millions of people to see that chart few questioned it.


I never had to quote those charts though. Instead the conversation went like this: “Hello boss, do you remember that fubar two months ago?” I even mention the concrete ticket number, and when the number alone makes my boss face twitch then I know i choose the right fubar. “We just merges a CI change. If that whole category of problems happens again we will know within the hour, instead of spending two weeks debuging.”

Its he same wisdom, but communicated in a more direct way.


Imagine a requirements document for Hacker News - such documents traditionally have a tremendous amount of change control around it, with tons of review for every change. Remember, people like you believe it is 100x easier to catch a bug in requirements and 100x gives you lots of budget to spend to make sure you catch bugs in requirements.

So, there's a bug in the requirements document. Specifically, the 'logout' link has a spelling error. It says 'lugout'. If someone were to catch this bug in the requirements doc, it would trigger a bunch of costly review, as we established earlier.

However, in the implementation phase, once the entire site is up and running, it would take maybe 5 minutes to fix.

Trivial example, but there it is.


Is this an imaginary situation or a real world experience of yours? I don't deny that IT is a wide and varied place, everything is possible I guess.

In my experience in situations where changing a typo in a document was hard, changing the production code was even harder. But happy to hear your story where this wasn't the case.

> Remember, people like you believe it is 100x easier to catch a bug in requirements

What? I never said that. It is easier to fix, if the problem is caught. That says nothing about how easy or hard it is to catch it.

> Trivial example, but there it is.

The original question was "Are Late-Stage Bugs More Expensive?" To which my answer was "Yes, but I can't show you my proof because I work on proprietary projects" (which I understand is an answer which leaves everyone unsatisfied.)

Your counter argument is "But let's imagine it is not!" ... which is what the phrase "begging the question" originally used to refer to. Imagining that the answer to a question is not what my experience tells me is not a convincing argument. But maybe I misunderstood what you wrote. If I did I'm sorry about that.


The chart very likely has the correct direction, and the numbers are probably completely wrong. From another point of view, the literal phase division there is nonsensical, but the idea it communicates isn't.

But that's not relevant. What's relevant is that this is not how you do research. You can't just assume that. Yeah, there are plenty of examples of high-cost production failures. There are plenty of examples of low-cost production failures too, and plenty of examples of projects that failed because people spent all their time fixing the same few issues. If you want to state it as a fact, you have to count all of those and see how they compare.


This tallies with my experience. It isn't early bugs that are expensive to fix. It's early design mistakes. For example choosing the wrong language, framework or architecture.

Dropbox using Python is a good example. Or Python's GIL.


The GIL is a good example because it removed optionality from Python by having its semantics leak out. Using a poorly performing tool is a different problem entirely.

Most of the really horrible errors in SW design involve choices where the semantics above will be visible and encoded as a subtle (or not so subtle) dependency that cannot later be fixed without an infeasible level of effort. I deal with these regularly and they are the true original sin in most sizable software projects.


They aren’t really exclusive at all. Early design mistakes can also be really expensive, but a really scary bug that nukes all your data could also suck.


They mean the cost of fixing the defect, not the severity of bugs.


Software engineering research is immature and, as such, has few (no?) strongly-supported claims that the entire field relies on. I agree this is a frustrating state of the field, but the way this is written makes the author appear to be unfamiliar with the state of things.

For example:

>Did I mention that all three of those papers use different definitions of “defect”?

This kind of thing is really common in academia. I do not know software engineering research in detail, but there are older and more basic definitional fights in the social sciences. Defining a defect, as I'm sure the author of the blog would agree, is not a trivial decision.

In general, this piece contains (perhaps hyperbolic?) dismissals of people trying to do work. It doesn't seem to respect what I imagine is earnest effort. Taken all together, it seems like the author either thinks the entire field is charlatans (which I doubt as they use the fields' conclusions at the end) or is making light of people working on problems they themselves do not have answers to in order to have a more lively writing style. It was fun to read and left a bad taste in my mouth.


As mentioned in TFA, the author has both written and spoken about software engineering research and related topics in some depth before. Though fiery, I think it does its job as a more informal cautionary tale on accepting research in the field without a critical gaze.

Analyses like [1] and [2] (from the same author) are extraordinarily valuable because they help us wade through the firehose of published studies to find what (if any) conclusions may be gleaned. Without such a filter, we are fated to a) repeat work by slogging through the torrent ourselves, or b) taking stuff at face value. Certainly there is no shortage of the latter happening on forums like this, and I wonder how many misinformed decisions are made as a result.

[1] https://danluu.com/empirical-pl/ [2] https://www.hillelwayne.com/post/are-we-really-engineers/


I found the complains about how reading papers, do literature research, evaluating papers and the field etc., is difficult somewhat amusing. That's actually what a large portion of the PhD education and becoming an academic is about. Yes, it is not easy, if it was a PhD wouldn't take between 3 and 6 years.

Generally, the author makes some good points and there is definitely often a disconnect between academia and industry. I think it is important to remember that science is much more like a directed random walk toward an unknown goal. If industry wants specific answers to specific questions they could (should) finance the studies that provide them the answers. However, my impression is that when industry finances significant studies it is much more toward validating/confirming their already established practices, products... (the topic of these "bought" studies is another can of worms in the topics of academia and industry interactions)


Greg Wilson is more positive. Though he does agree that nobody cares.

https://third-bit.com/2021/07/17/software-engineerings-great...


I'm skeptical of Wilson's reporting. One of his quoted studies looked interesting (slide 20, Fucci 2016, "An External Replication on the Effects of Test-driven Development Using a Multi-site Blind Analysis Approach"), so I looked it up.

Wilson describes that study as involving 39 professionals on real projects.

But when I looked at the actual study [1], it involved 21 graduate students working on two toy problems, the Bowling Game kata and "Mars Rover API." This is a disappointing misrepresentation of the study.

A charitable interpretation is that he was referring to a different study, and accidentally put the wrong reference on that slide. Or maybe the actual talk explains the difference. I'm not sure—I didn't look deeper.

[1] http://people.brunel.ac.uk/~csstmms/FucciEtAl_ESEM2016.pdf


That (or more likely https://www.youtube.com/watch?v=HrVtA-ue-x0 since that's where the info is) would make a good HN submission in its own right. Probably best to wait a few days to let the hivemind caches clear (followup/copycat posts aren't great: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&sor...).

If you submit it, let us know at hn@ycombinator.com so we can put it in the second-chance pool (https://news.ycombinator.com/pool, explained at https://news.ycombinator.com/item?id=26998308), so it will get a random placement on HN's front page.


This is a cool system!


> But first you gotta find the primary sources.

Yeah, and the real primary sources are the companies building and running software. They don't have time or interest in the long term research. And likely they aren't even concerned with their own technical situation 12 months in advance. They just need to "deliver", they think.

So unless you can find some clever way to gather all their data (Sentry, AppSignal, ...?), you will never know what's really happening in most places. Your research will be artificial without that real world data.


A lot of software in production is designed like a bunch of railroads that are unnecessarily twisted and dangerously run into each other for no apparent reason. So train wrecks are to be expected.


Somewhat related: I enjoyed reading the eBook "The Leprechauns of Software Engineering" by Laurent Bossavit [1]. The author tried to track down the source research behind various pieces of received wisdom. I remember that the claimed cost of fixing bugs at a later stage was one of them.

[1]: https://leanpub.com/leprechauns


The Bug Vs Defect made me laugh back at my first job we partnered with the BSI to do research that went into BS5750 / ISO 9000.

The Head of software side at the BSI commented that tat for old timers in QA - software was odd as they "did not know wear to hang the defect tags"


The only truths that I've known so far to be true:

1) Small patches with quick feedback

2) Automate the manual stuff, even if you're not doing it often you'll probably like having that script ready and tested when you'll have to do it again in one year.


> The average developer thinks empirical software engineering is a waste of time.

Bold claim. Any data at all to back that up?

My guess based on experience would be most developers are unfamiliar with the term and haven’t made a judgement one way or the other.


Software engineering can only exist with an incredible amount of discipline.

We routinely start new implementations in excel and perform formal normalization checks over models before writing any code.

You can get away without having a strong type system as long as your tables and relations are clean. This stuff is so powerful.

Figuring out how to funnel your domain instances into a SQL db and writing all your complex logic in terms of basic queries is how you can build an ecosystem in which determinism and formal proofs of correctness are possible.

For me, software engineering is about correctness for the business end. There is certainly still a layer of craftsmanship below the engineering realm which enables all the magic. This is where things like performance and reliability live. Interacting with the outside world is an icky thing, and the specific implementation always depends on the use case.


Software engineering is so weird -- what you've just described sounds awful to me. I believe almost the exact opposite of everything you've said. Yet I have no doubt* you've seen good results from your approach, and I get good results from mine. It's surprising how personal software engineering really is.

* I do actually have doubts, because 99.9% of software is utter garbage. But I'm trying to be more positive and the point stands whether you're in the 99.9% or the 0.1%.


Willingness to tolerate formal methods does not mean you always have to endure pain and suffering.

On the contrary, you may find that you are shipping software that is so good you begin to see the value.

For a lot of shops, the problem domain probably isn't complex enough to justify intense formality in the process (i.e. engineering). Once you are dealing with monsters like supply chain or factory automation, you can't play games anymore.


> On the contrary, you may find that you are shipping software that is so good you begin to see the value.

You may also find that when you finally do ship, everyone had started using your competitor, or that your customers needed something slightly different, or that the cost of waiting for you was so high it would have been better having something early.

I have great respect for formal verified software, I have even touched on it a bit during my PhD, and there are definitely places for where it makes sense. Same for formal methods in general. But there are other software tasks where it is not just 'not needed' but it is actually the wrong choice. It depends a lot on the opportunity cost of delivering later, and on how well defined the tasks are, and how preffesional the customer is. And the tasks where less formal methods are the right way are not "playing games", they are solving the problem under a different set of constrains, also doing software engineering.


I wasn't recoiling at the fact that you use formal methods, actually. I know it's too late for this comment to be seen, but let me actually point out where my "opposites" are by marking up your comments with my version in [brackets]. Admittedly I'm filling in gaps with my own assumptions about what you mean, and you might agree with me more than I think.

> Software engineering can only exist with an incredible amount of discipline. [exploration, creativity, iteration, and discussion are more important than discipline, otherwise you'll be very disciplined in building the wrong thing. This applies at every point of the software lifecycle, not just the beginning!]

> We routinely start new implementations in excel [start as you mean to continue. If you prefer Excel over your programming toolset, you should focus on improving your programming toolset. Excel might help with quick mockups and proofs-of-concept, but should not be the basis of your formal design. Your code should!] and perform formal normalization checks over models before writing any code [you should strive to make your code itself the formal document. Code isn't the output of the design, it's the actual design document itself. Instead of separately "proving" things about your code, write your code as you would write the proof].

> You can get away without having a strong type system [why are you trying to take the "formal" part out of your code and into some other document? Apply your rigor and discipline to the code itself, not some other design document! Strong type systems help you do this.] as long as your tables and relations are clean. [the database is "just" a persistence layer and you're locking yourself into a brittle design if its schema permeates the rest of your codebase. This is probably why you include so much upfront design in your process: you're making way too many early decisions at the database level.] This stuff is so powerful.

> Figuring out how to funnel your domain instances into a SQL db [everyone seems to assume you need to build your application entirely around a relational database. Build your application around what you want to be able to do, and keep your DB as a storage layer!] and writing all your complex logic in terms of basic queries [write your complex logic in code as close as you can to how you would write a requirements document. Simple queries are great but they're the output of a good middle-tier, not a primary representation.] is how you can build an ecosystem in which determinism [huh? "Determinism" is an unexpected word there. You shouldn't struggle with determinism if you use immutability everywhere you can, except maybe in advanced UI use cases] and formal proofs of correctness are possible [again: you should strive to write your code as close to a "formal proof" as you possibly can, instead of treating the proof as a separate step].

> For me, software engineering is about correctness for the business end. There is certainly still a layer of craftsmanship below the engineering realm which enables all the magic. This is where things like performance and reliability live. Interacting with the outside world is an icky thing, and the specific implementation always depends on the use case. [all agreed here]

Look: iterative/Agile methods might appeal to "cowboy coders" who eschew discipline and formal methods, and that gives them a reputation for being associated with "move fast and break things". But they are not inherently less formal than traditional/waterfall "big up front design" methods. The big difference is not a question of "how disciplined should you be?", but rather "where does the discipline go?" Traditional/waterfall methods say "the discipline goes at the start of the project, in big design documents that are separate from the code. Then, later, you code". Iterative/Agile methods say "no, the discipline is maintained throughout the entire project lifecycle and is integrated into how we plan and write the code and manage the ongoing evolution of the project."



Sci-hub is great, but it should be your last resort. University librarians have explained this to me, but it has to do with how Sci-hub counts towards access counts of obscure journals.

Unpaywall provides an excellent browser extension. https://unpaywall.org/

If you have time, you can request a copy of a paper directly from the corresponding author. We love for people to be interested in our work, and we would rather you put that $40 access fee towards something that benefits society.


Amazing extension. Thank you.


It sounds like the author doesn't hate science (which is a nonsensical thing to hate), but academia. Those are two different things.


"View Source" may be helpful if you're just seeing a blank page.


It's completely besides the point, but the capitalizing of the I of « **Ing » in the title makes me unreasonably grumpy.


this problem where published science is often wrong and you don't need to actually be an engineer that history will not record the name of is completely new, and also, airplanes are impossible and where actually invented by The Wright Bros, because they had the Wright stuff, right?


I don't know what "FM" and other initializations are. Please when writing an article, expand them in parenthesis the first time they are mentioned.


It’s expanded the first time it’s used (in the sentence preceding the one with the abbreviation).


I didn't catch that either. Sucks you're getting downvoted for it when it isn't as obvious as others seem to think.


I'll put a quick link in


FM = Formal Methods


If you search for "science" in this article, you will have great difficulty in figuring out what the title means, unless you are smarter than me, which is entirely possible.


There's another strategy that works better than searching for a keyword, which is to read the article.

It's about the difficulty/expense of systematically and scientifically studying the software development process (e.g. finding out what data actually exists to support the claim that "a bug is cheaper to squash when it is caught during the specification process").

The title expresses frustration not at science itself, but at how needlessly difficult this process is, since both the secondary and primary academic literature surrounding it are of extremely low quality.


But specifically with respect to software engineering, it's really hard to do solid, repeatable experiments. It's not like physics, it's like sociology or psychology - which also have trouble doing precise, repeatable science.

Why is it like sociology or psychology? Because it's about people, not about things. Software engineering is not just about languages and programming techniques; it's about how people interact with those languages and techniques. The people have far more variability than the languages and techniques do. Cutting through that to be able to accurately say something about any language or technique is... really hard.


I think at the present time, software can be more like science than sociology.

Both suffer from abstraction.

I think psychology when closer to biology is similar to programming like running the programming language C experiments.


Despite insistence of programmers wanting to be scientists and engineers, it's not true to the definition.

Maybe safety critical C or assembly is engineering.

Maybe testing code for speed can be considered science.

By the time any useful program is finished, there's significantly more tradition and art than Science and engineering.


Sure, but I think the underlying instinct behind "evidence based software engineering" is precisely about turning those "maybes" into measurements.

In principle, I'm in favor of this endeavor, and the article at hand expresses frustration about apparently arbitrary barriers towards accomplishing these measurements.

I don't know about you, but I'm generally always in favor of "let's measure more things and build more models from that data, to make sure we're actually improving"


Try figuring it out through the context. There are paragraphs upon paragraphs of it in the article.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: