> Not that I understand much of what they say, but it appears there are a lot of...

bobbylarrybobby · 2025-10-27T03:08:42 1761534522

Could this really invalidate research? Managing to produce a model that works (assuming you check all of the myriad modeling correctness checkboxes) is sufficient on its own. The fact that the modeling process itself was broken in some way — but not the assumptions made of the model inputs, or data leakage assumptions, or anything that fundamentally undermines any model produced — has no bearing on the outcome, which is the fact that you got a model that evidently did make accurate predictions.

Majromax · 2025-10-27T12:52:10 1761569530

> Could this really invalidate research? Managing to produce a model that works (assuming you check all of the myriad modeling correctness checkboxes) is sufficient on its own.

In the academic sense, a model that happens to work isn't research; the product of research should be a technique or insight that generalizes.

"Standard technique X doesn't work in domain Y, so we developed modified technique X' that does better" is the fundamental storyline of many machine learning papers, and that could be 'invalidated' if the poor performance of X was caused by a hidden correctness bug avoided by X'.

p1esk · 2025-10-27T14:49:32 1761576572

a lot of research could be invalidated, so obviously would make huge news.

A lot of research is unreproducible crap. That’s not news to anyone. Plus, bugs usually make results worse, not better.

Calavar · 2025-10-26T23:10:00 1761520200

There are many more ways to degrade model performance than to enhance it, so I would expect the vast majority of bugs to lead to artificially reduced accuracy, not artificially increased accuracy.

So if PyTorch is full of numerical flaws, that would likely mean many models with mediocre/borderline performance were discarded (never published) because they just failed to meet the threshold where the authors felt it was worth their time to package it up for a mid-tier conference. A finding that many would-be mediocre papers are actually slightly less mediocre than believed would be an utterly unremarkable conclusion and I believe that's why we haven't seen a bombshell analysis of PyTorch flaws and reproducibility at NeurIPS.

A software error in, say, a stats routine or a data preprocessing routine would be a different story because the degrees of freedom are fewer, leaving a greater probability of an error hitting a path that pushes a result to look artificially better as opposed to artificially worse

dangoodmanUT · 2025-10-26T18:05:45 1761501945

Check their Twitter, I saw something either yesterday or earlier today iirc