Is "proof of vulnerability" a marketing term, or do you actually claim that XBOW has a 0% false positive rate? (i.e. "all" reports come with a PoV, and this PoV "proves" there is a vulnerability?)
This doesn't say anything about many false positives they actually have. Yes, you can write other programs (that might even invoke another LLM!) to "check" the findings. That's a very obvious and reasonable thing to do. But all "vulnerability scanners", AI or not, must take steps to avoid FP -- that doesn't tell us how well they actually work.
The glaring omission here is a discussion of how many bugs the XBOW team had to manually review in order to make ~1k "valid" submissions. They state:
> It was a unique privilege to wake up each morning and review creative new exploits.
How much of every morning was spent reviewing exploits? And what % of them turned out to be real bugs? These are the critical questions that are (a) is unanswered by this post, and (b) determine the success of any product in this space imo.
> And it’s not just a pace thing, there’s a threshold of clarity that divides learned nothing from got at least one new idea.
But these days, ideas are quite cheap: in my experience, most researchers have more ideas than students to work on them. Many papers can fit their "core idea" in a tweet or two, and in many cases someone has already tweeted the idea in one form or another. Some ideas are better than others, but there's a lot of "reasonable" ideas out there.
Any of these ideas can be a paper, but what makes it science can't just be the fact that it was communicated clearly. It wouldn't be science unless you perform experiments (that accurately implement the "idea") and faithfully report the results. (Reviewers may add an additional constraint: that the results must look "good".)
So what does science have to do with reviewers' fixation on clarity and presentation? I claim: absolutely nothing. You can pretty much say whatever you want as long as it sounds reasonable and is communicated clearly (and of course the results look good). Even if the over-worked PhD student screws up the evaluation script a bit and the results are in their favor (oops!), the reviewers are not going to notice so long as the ideas are presented clearly.
Clear communication is important, but science cannot just be communicating ideas.
Clarity is and should be absolutely crucial, though.
As an academic I need to be up to date in my discipline, which means skimming hundreds of titles, dozens of abstracts and papers, and thoroughly reading several papers a week, in the context of a job that needs many other things done.
Papers that require 5x the time to read because they're unnecessarily unclear and I need to jump around deciphering what the authors mean are wasting me and many others' time (as are those with misleading titles or abstracts), and probably won't be read unless absolutely needed. They are better caught at the peer review stage. And lack of clarity can also often cause lack of reproducibility when some minor but necessary detail is left ambiguous.
Clarity is relative. You can be super clear, but if it goes against what the reviewer thinks they know, it will be perceived as unclear. You can also point to references that clear up any remaining doubt about how something is meant, but of course the reviewer will never check out these references.
In the end, getting a paper accepted is a purely social game, and has not much to do with how clear your science is described, especially for truly novel research.
1) The whole opening segment is the literature review.
2) If you are coming up with a novel concept, then you would be explaining how it shows up in relation to known fact.
Then you would be providing evidence and experiment.
The entire structure is designed to ensure as many affordances for the author to make their case.
Being accepted as a social game is the cynical view that ignores that academia still works. It’s academia itself which recognizes these issues and is trying to rectify the situation.
As you have seen in the post (did you read it, dear reviewer?), references on the first page where unhelpful, so 1) comes already with a caveat.
And so on.
I think the social game view is at this point entirely justified, and there is nothing cynical about it. And no, academia does not still work.
A given "structure" is also ridiculous, and part of the problem. Once you care more about the form than the content, form is what prevails.
The truth is: To understand a paper properly, you need to deal with it properly, not just the 5 minutes it takes to skim the first pages and make up your opinion there already. Fifteen pages is short enough, and if you cannot commit to properly review this, for a week or so of dedicated study, just don't review it. We would all be better off for it.
> The truth is: To understand a paper properly, you need to deal with it properly, not just the 5 minutes it takes to skim the first pages and make up your opinion there already. Fifteen pages is short enough, and if you cannot commit to properly review this, for a week or so of dedicated study, just don't review it. We would all be better off for it.
Reviewing dynamics make this hard. There is little to no reward for reviewers, and it is much easier to write a long and bad paper than it is to review it carefully (and LLMs have upset this balance even further). To suggest that every submitted paper should occupy several weeks of expert attention is to fundamentally misunderstand how many crappy papers are getting submitted.
> But these days, ideas are quite cheap: in my experience, most researchers have more ideas than students to work on them.
By “idea” researchers usually imply “idea for a high-impact project that I’m capable of executing”. It’s not just about having ideas, but about having ideas that will actually make an impact on your field. Those again come in two flavors: “obvious ideas” that are the logical next step in a chain of incremental improvements, but that no one yet had time or capability to implement; and “surprising ideas” that can really turn a research field upside down if it works, but is inherently a high-risk/high-reward scenario.
Speaking as a physicist, I find the truly “surprising ideas” to be quite rare but important. I get them from time to time but it can take years between. But the “obvious” ideas, sure, the more students I have the more of them I’d work on.
> Any of these ideas can be a paper, but what makes it science can't just be the fact that it was communicated clearly. It wouldn't be science unless you perform experiments (that accurately implement the "idea") and faithfully report the results. (Reviewers may add an additional constraint: that the results must look "good".)
I kinda agree with this. With the caveat that I’d consider e.g. solving theoretical problems to also count under “experiment” in this specific sentence, since science is arguably not just about gathering data but also developing a coherent understanding of it. Which is why theoretical and numerical physics count as “science”.
On the other hand, I think textbooks and review papers are crucial for science as a social process. We often have to try to consolidate the knowledge gathered from different research directions before we can move forward. That part is about clear communication more than new research.
It's not too difficult to state any idea, even a surprising one. But often, papers with surprising ideas (or maybe the right thing to say is surprising results?) turn out to be wrong!
I think it's still the case that there's lots of ideas that (if they worked!) would be surprising. Anyone can state outlandish ideas in a paper -- imo the contribution is proving (e.g. with sound "experiments", interpreted broadly) that they actually work. Unfortunately, I think clarity of writing matters more to reviewers than the soundness of your experiments. I think in CS this could very well change if the reviewers willed it (i.e. require artifact submission with the paper, and allow papers to be rejected for faults in the artifact)
I think in some ways science has been co-opted by careerists who try to minmax output to accelerate their careers. Being idea obsessed is part of this. It’s much easier to get a paper published that’s on the hype train as opposed to a paper that challenges some idea. Publications justify grant money, grant money justifies more people and more power, more power justifies promotions. And if you talk with early career scientists they all will say they are only doing it until they get a permanent position. Then they will become more curious. Maybe they do, maybe they don’t, I have many older colleagues who are quite curious compared to their younger counterparts. but I believe rewarding ambition at the expense of curiosity is somewhat anti intellectual. It’s sad because I think science should reorganise as the current structure of departments into disciplines may be dated and restructuring could help alleviate this a lot since interdisciplinary work may leverage curiousity over ambition as curiosity will be rewarded with high impact work. But who knows. I can arm chair my way into anything.
The point of a paper isn't "I had this idea" nor is it "I have this evidence". It is "I had this idea, and it turned out to work! (btw here's the evidence I found that convinced me it works).
The value lies in getting true ideas in front of your eyeballs. So communicating the idea clearly is crucial to making the value available.
I agree with you. But what part of the blog post, or the peer review process in general, do you think ensures that only true ideas get in front of eyeballs?
I can write anything I want in the paper, but at the end of the day my experiments could do something slightly (or completely) different. Where are reviewers going to catch this?
As the Article said, the first page gets a paper accepted. The remaining pages serve to not get the paper rejected. That includes actually backing up the claims on the first page.
Wrong things definitely still make it through, both mistakes and fraud. But it is a pretty strong filter.
I agree that peer review can be a strong filter, but it's a filter for claims and evidence that sound true. CS papers can and do hide important details in the code (details which, I argue, would get a paper rejected if they were stated in the paper).
Regardless of the strength of the filter, if the filter's inputs are just "the paper", but the claims depend on the details in another artifact (i.e. the code), how can we argue that peer review filters for the truth?
This person is a clown, probably with a paid agenda, and they should be disallowed from saying such dumb things where smart people with useful skills might read it.
I have a theory that this focus on ideas vs solutions also divides individual researchers, in what drives them. Agreed that academia celebrates and rewards ideas, not solutions. And maybe that’s ok and how it should be, solutions can be done in industry? But the SNR of ideas feels too high at this point.
generating the ideas “planets move at constant per planet velocity” “planets move at a specific speed as a power law function of distance from the sun and we fit the paramets great” “each planet sweeps equal areas in equal time” is cheap, but evaluating which idea is good is expensive, and the whole value of that evaluation is captured in the final idea
This whole take is embarrassingly ignorant and no one with the credentials has the time to check you. We need people to do real thinking and they need to ignore you.
I don't understand why took the time to leave (three?) personal attacks, rather than just provide your perspective? I'm willing to acknowledge that my opinion has limitations (I think it mainly applies to CS-adjacent fields with empirical performance evaluations, and where experiments can easily be independently verified).
It really upsets me when i see this kind of hollow rhetoric, which is specifically designed to confuse, and to stear people away from learning and instead make them sceptical of the most honest pursuers of truth.
None of this attack was personal, that is wrong. All was deserved. Complaining about academia without any specifics is one of the dumbest and most well funded and prominent memes in the pseudo intellectual podcast sphere today.
Peer review is not designed for science. Many papers are not rejected because of an issue with the science -- in fact, reviewers seldom have the time to actually check the science! As a CS-centric example: you'll almost never find a reviewer who reads a single line of code (if code is submitted with the paper at all). There is artifact review, but this is never tied to the acceptance of the paper. Reviewers focus on ideas, presentation, and the presented results. (And the current system is a good filter for this! Most accepted papers are well-written and the results always look good on paper.) However, reviewers never take the time to actually verify that the experiment code matches the ideas described in the paper, and that the results reproduce. Ask any CS/engineering PhD student how many papers (in top venues) they've seen with a critical implementation flaw that invalidates the results -- and you might begin to understand the problem.
At least in CS, the system can be fixed, but those in power are unable and unwilling to fix it. Authors don't want to be held accountable ("if we submit the code with the paper -- someone might find a critical bug and reject the paper!"), and reviewers are both unqualified (i.e. haven't written a line of code in 25 years) and unwilling to take on more responsibility ("I don't have the time to make sure their experiment code is fair!"). So we are left with an obviously broken system where junior PhD students review artifacts for "reproducibility" and this evaluation has no bearing whatsoever on whether a paper gets accepted. It's too easy to cook up positive results in almost any field (intentionally, or unintentionally), and we have a system with little accountability.
It's not "the best we have", it's "the best those in power will allow". Those in power do not want consequences for publishing bad research, and also don't want the reviewing load required to keep bad research out.
This is much too negative. Peer review indeed misses issues with papers, but by-and-large catches the most glaring faults.
I don’t believe for one moment that the vast majority of papers in reputable conferences are wrong, if only for the simple reason that putting out incorrect research gives an easy layup for competing groups to write a follow-up paper that exposes the flaw.
It’s also a fallacy to state that papers aren’t reproducible without code. Yes code is important, but in most cases the core contribution of the research paper is not the code, but some set of ideas that together describe a novel way to approach the tackled problem.
I spent a chunk of my career working on productionizing code from ML/AI papers and huge part of them are outright not reproducible.
Mostly they lack critical information (missing chosen constants in equations, outright missing information on input preparation or chunks of "common knowledge algorithms"). Those that don't have measurements that outright didn't fit the reimplemented algorithms or only succeeded in their quality on the handpicked, massaged dataset of the author.
That’s the difference between truly new approaches to modelling an existing problem, or coming up with a new problem. No set of a bit different results or missing exact hyperparameter settings really invalidates the value of the aforementioned research. If the math works, and is a nice new point of view, its good. It may not even help anyone with practical applications right now, but may inspire ideas further down the line that do make the work practicable, too.
In contrast, if the main value of a paper is a claim that they increase performance/accuracy in some task by x%, then its value can be completely dependent on whether it actually is reproduceable.
Sounds like you are complaining about the latter type of work?
> No set of a bit different results or missing exact hyperparameter settings really invalidates the value of the aforementioned research.
If this is the case, the paper should not include a performance evaluation at all. If the paper needs a performance evaluation to prove its worth, we have every right to question the way that evaluation was conducted.
I don't think theres much value in theoretical approaches that lack important derivation data either, so no need to try to split the papers like this. The academic CS publishing is flooded with bad quality papers in any case.
I spent 3 months implementing a paper once. Finally, I got to the point where I understood the paper probably better than the author. It was an extremely complicated paper (homomorphic encryption). At this point, I realized that it doesn't work. There was nothing about it that would ever work, and it wasn't for lack of understanding. I emailed the author asking to clarify some specific things in the paper, they never responded.
In theory, the paper could work, but it would be incredibly weak (the key turned out to be either 1 or 0 -- a single bit).
Anecdotally it is not. Most papers in CS I have read have been bad and impossible to reproduce. Maybe I have been unlucky but my experience is sadly the same.
I did not dispute that peer review acts as a filter. But reviewers are not reviewing the science, they are reviewing the paper. Authors are taking advantage of this distinction.
> if only for the simple reason that putting out incorrect research gives an easy layup for competing groups to write a follow-up paper that exposes the flaw.
You can’t make a career out of exposing flaws in existing research. Finding a flaw and showing that a paper from last year had had cooked results gets you nowhere. There’s nowhere to publish “but actually, this technique doesn’t seem to work” research. There’s no way for me to prove that the ideas will NEVER work —- only that their implementation doesn’t work as well as they claimed. Authors who claim that the value is in the ideas should stick to Twitter, where they can freely dump all of their ideas without any regard for whether they will work or not.
And if you come up with another way of solving the problem that actually works, it’s much harder to convince reviewers that the problem is interesting (because the broken paper already “solved” it!)
> in most cases the core contribution of the research paper is not the code, but some set of ideas that together describe a novel way to approach the tackled problem
And this novel approach is really only useful if it outperforms existing techniques. “We won’t share the code but our technique works really well we promise” is obviously not science. There is a flood of papers with plausible techniques that look reasonable on paper and have good results, but those results do not reproduce. It’s not really possible to prove the technique “wrong”, but the burden should be on the authors to provide proof that their technique works and on reviewers to verify it.
It’s absurd to me that mathematics proofs are usually checked during peer review, but in other fields we just take everyone at their word.
They aren’t necessarily wrong but most are nearly completely useless due to some heavily downplayed or completely omitted flaw that surfaces when you try to implement the idea in actual systems.
There is technically academic novelty so it’s not “wrong”. It’s just not valuable for the field or science in general.
I don't think anyone is saying it's not reproducible without code, it's just much more difficult for absolutely no reason. If I can run the code of a ML paper, I can quickly check if the examples were cherry-picked, swap in my own test or training set... The new technique or idea was still the main contribution, but I can test it immediately, apply it to new problems, optimise the performance to enable new use-cases...
It's like a chemistry paper for a new material (think the recent semiconductor thing) not including the amounts used and the way the glassware was set up. You can probably get it to work in a few attempts, but then the result doesn't have the same properties as described, so now you're not sure if your process was wrong or if their results were.
More code should be released, but code is dependent on the people or environment that run it. When I release buggy code I will almost always have to spend time supporting others in how to run it. This is not what you want to do in Proof of concept to prove an idea.
I am not published but I have implemented a number of papers to code, it works fine (hashing, protocols and search mostly). I have also used code dumps to test something directly. I think I spend less time on code dumps, and if I fail I give up easier. That is the danger you start blaming the tools instead of how good you have understood the ideas.
I agree with you that more code should be released.. It is not a solution for good science though.
Sharing the code may also share the incorrect implementation biases.
It's a bit like saying that to help reproduce the experiment, the experimental tools used to reach the conclusion should be shared too. But reproducing the experiment does not mean "having a different finger clicking on exactly the same button", it means "redoing the experiment from scratch, ideally with a _different experimental setup_ so that it mitigates the unknown systematic biases of the original setup".
I'm not saying that sharing code is always bad, you give examples of how it can be useful. But sharing code has pros and cons, and I'm surprised to see so often people not understanding that.
If they don't publish the experimental setup, another person could use the exact same setup anyway without knowing. Better to publish the details so people can actually think of independent ways to verify the result.
But they will not make the same mistakes. If you ask two persons to build a software, they can use the same logic and build the same algorithm, but what are the chances they will do exactly the same bugs.
Also, your argument seems to be "_maybe_ they will use the exact same setup". So it already looks better than the solution where you provide the code and they _will for sure_ use the exact same setup.
And "publish the details" corresponds to explain the logic, not share the exact implementation.
Also, I'm not saying that sharing the code is bad, but I'm saying that sharing the code is not the perfect solution and people who thinks not sharing the code is very bad are usually not understanding what are the danger of sharing the code.
Nobody said sharing the code "is the perfect solution". Just that sharing the code is way better and should be commonplace, if not required. Your argument that not doing so will force other teams to do re-write the code seems unrealistic to me. If anyone wants to check the implementation they can always disregard the shared code, but having it allows other, less time-intensive checks to still happen: like checking for cherry-picked data, as GP suggested, looking through the code for possible pitfalls etc. Besides, your argument could be extended to any specific data the paper presents: why publish numbers so people can get lazy and just trust them? Just publish the conclusion and let other teams figure out ways to prove/disprove it! - which is (more than) a bit ridiculous, wouldn't you say?
And I disagree with that and think that you are overestimating the gain brought by sharing the code and are underestimating the possible problems that sharing the code bring.
At CERN, there are 2 generalistic experiments, CMS and ATLAS. The policy is that people from one experiment are not allowed to talk of undergoing work with people from the other. You notice that they are officially forbidden, not "if some want to discuss, go ahead, others may choose to not discuss". Why? Because sharing these details is ruining the fact that the 2 experiments are independent. If you hear from your CMS friend that they have observed a peak at 125GeV, you are biased. Even if you are a nice guy and try to forget about it, it is too late, you are unconsciously biased: you will be drawn to check the 125GeV region and possibly notice a fluctuation as a peak while you would have not noticed otherwise.
So, no, saying "I give the code but if you want you may not look at it" is not enough, you will still de-blind the community. As soon as some people will look at the code, they will be biased: if they will try to reproduce from scratch, they will come up with an implementation that is different from the one they would have come up with without having looked at the code.
Nothing too catastrophic either. Don't get me wrong, I think that sharing the code is great, in some cases. But this picture of saying that sharing the code is very important is just misunderstanding of how science is done.
As for the other "specific data", yes, some data is better not to share too if it is not needed to reproduce the experiment and can be source of bias. The same could be said about everything else in the scientist process: why sharing the code is so important, and not sharing all the notes of each and every meetings? I think that often the person who don't understand that is a software developer, and they don't understand that the code that the scientist creates is not the science, it's not the publication, it's just the tool, the same way a pen and a piece of paper was. Software developers are paid to produce code, so code is for them the end goal. Scientists are paid to do research, and code is not the end goal.
But, as I've said, sharing the code can be useful. It can help other teams working on the same subject to reach the same level faster or to notice errors in the code. But in both case, the consequence is that these others teams are not producing independent work, and this is the price to pay. (and of course, they are layers of dependence: some publications tend to share too much, other not, but it does not mean some are very bad and others very good. Not being independent is not the end of the world. The problem is when someone considers that sharing the code is "the good thing to do" without understanding that)
What you're deliberately ignoring is that omitting important information is material to a lot of papers because the methodology was massaged into desired results to created publishable content.
It's really strange seeing how many (academic) people will talk themselves into bizarre explanations for a simple phenomenon of widespread results hacking to generate required impact numbers. Occams razor and all that.
If it is massaged into desired results, then it will be invalidated by facts quite easily. Inversely, obfuscating things is also easy if you just provide the whole package and just say "see, you click on the button and you get the same result, you have proven that it is correct". No providing code means that people will redo their own implementation and come back to you when they will see they don't get the same results.
So, no, no need to invent that academics are all part of this strange crazy evil group. Academics are debating and are being skeptical of their colleagues results all the time, which is already contradictory to your idea that the majority is motivated by frauding.
Occams razor is simply that there are some good reasons why code is not shared, going from laziness to lack of expertise on code design to the fact that code sharing is just not that important (or sometimes plainly bad) for reproducibility, no need to invent that the main reason is fraud.
Ok, that's a bit naive now. The whole "replication crisis" is exactly the term for bad papers not being invalidated "easily". [1]
Beacuse - if you'd been in academia - you'd find out that replicating papers isn't something that will allow you to keep your funding, your job and your path to next title.
And I'm not sure why did you jump to "crazy evil group" - noone is evil, everyone is following their incentives and trying to keep their jobs and secure funding. The incentives are perverse. This willing blindness against perverse incentives (which appears both in US academia and corporate world) is a repeated source of confusion for me - is the idea that people aren't always perfectly honest when protecting their jobs, career success and reputation really so foreign to you?
That's my point: people here link the replication crisis to "not sharing the code", which is ridiculous. If you just click on a button to run the code written by the other team, you haven't replicated anything. If you review the code, you have replicated "a little bit" but it is still not as good as if you would have recreated the algorithm from scratch independently.
It's very strange to pretend that sharing the code will help the replication crisis, while the replication crisis is about INDEPENDENT REPLICATION, where the experience is redone in an independent way. Sometimes even with a totally perpendicular setup. The closer the setup, the weaker is the replication.
It feels like it's watching the finger who point at the moon: not understanding that replication does not mean "re-running the experiment and reaching the same numbers"
> noone is evil, everyone is following their incentives and trying to keep their jobs and secure funding
Sharing the code has nothing to do with the incentives. I will not loose my funding if I share the code. What you are adding on top of that, is that the scientist is dishonest and does not share because they have cheated in order to get the funding. But this is the part that does not make sense: unless they are already established enough to have enough aura to be believed without proofs, they will lose their funding because the funding is coming from peer committee that will notice that the facts don't match the conclusions.
I'm sure there are people who down-play the fraud in the scientific domain. But pretending that fraud is a good strategy for someone's career and that it is why people will fraud so massively that sharing the code is rare, this is just ignorance of the reality.
I'm sure some people fraud and don't want to share their code. But how do you explain why so many scientists don't share their code? Is that because the whole community is so riddled with cheaters? Including cheaters that happens to present conclusions that keep being proven correct when reproduced? Because yes, there are experiments that have been reproduced and confirmed and yet the code, at the time, was not shared. How do you explain that if the main reason to not share the code is to hide cheating?
I've spent plenty of time of my career doing exactly the type of replication you're talking about and easily the majority of CS papers weren't replicable with the methodology written down on the paper and on dataset that wasn't optimized and preselected by the papers author.
I didn't care about sharing code (it's not common), but independent implementation and comparison of ML and AI algorithms with purpose of independent comparison. So I'm not sure why you're getting so hung up on the code part: majority of papers were describing trash science even in their text in effort to get published and show results.
I'm sorry that the area you are exercising in is rotten and does not have the minimum scientific standard. But please, do not reach conclusion that are blatantly incorrect in areas you don't know.
The problem is not really "academia", it is that, in your area, the academic community is particularly poor. The problem is not really the "replication crisis", it is that, in your area, even before we reach the concept of replication crisis, the work is not even reaching the basic scientific standard.
Oh, I guess it is Occams Razor after all: "It's really strange seeing how many (academic) people will talk themselves into bizarre explanations for a simple phenomenon of widespread results hacking to generate required impact numbers". Occams Razor explanation: so many (academic) people will not talk about the malpractice because so many (academic) people work in an area where these malpractice are exceptional.
But what’s the point of the peer review process if it’s not sifting out poor academic work?
It reads as if your point is talking in circles. “Don’t blame academia when academia doesn’t police itself” is not a strong stance when they are portrayed as doing exactly that. Or, maybe more generously, you have a different definition of academia and it’s role.
I think sharing code can help because it’s part of the method. It wouldn’t be reasonable for omitting aspects of the methodology of a paper under the guise that replication should devise their own independent method. Explicitly sharing methods is the whole point of publication and sharing it is necessary for evaluating its soundness, generalizability, and limitations. izacus is right, a big part of the replication crisis is because there aren’t near as many incentives to replicating work and omitting parts of the method make this worse, not better.
Maybe for the audience here, it is useful to consider that peer review is a bit like scrum. It's a good idea, but it does not mean that everyone who say they do scrum does it properly. And when, in some situation, it does not work, it does not mean that scrum is useless or incorrect.
And, like "scrum", "academia" is just the sum of the actors, including the paper authors. It's even more obvious that peer review is done by other paper authors: you cannot really be a paper author and blame "academia" for not doing a good peer review, because you are one of the person in charge of the peer review yourself.
As for "sharing code is part of the method", it is where I strongly disagree. Reproducibility and complete description allowing reproducibility is part of the method, but keeping enough details blinded (a balance that can be subjective) is also part of the method. So, someone can argue that sharing code is in contradiction with some part of the method. I think one of the misunderstanding is that people cannot understand that "sharing methods" does not require "sharing code".
Again, the "replication crisis" can be amplified by sharing code: people don't replicate the experiment, they just re-run it and then pretend it was replicated. Replicating the experiment means re-proving the results in an independent way, sometimes even with an orthogonal setup (that's why CMS and ATLAS at CERN are using on purpose different technologies and that they are not allowed to share their code). Using the same code is strongly biased.
It seems you are conflating concepts, maybe because you take it personally which it shouldn’t be. The process can be broken, but that doesn’t mean the academic is bad, just that they are part of a broken process. Likewise if a scrum is a broken process, it will lead to bad results. If it isn’t “done properly” then we seem to be saying the same thing: the process isn’t working. As I and others have said, there are some misaligned incentives which can lead to a broken process. Just because it sometimes works doesn’t mean it’s a good process, anymore than a broken clock is still correct twice a day. It varies by discipline, but there seems to be quite a few domains where there is actually more bad publications than good. That signals a bad process.
As others have talked about here, sometimes it becomes impossible to replicate the results. Is it because of some error in the replication process, the data, the practioner, or is the original a sham? It's hard to deduce when there's a lot you can't chase down.
I also think you are applying an overly superficial rationalization as to why sharing code would amplify the replication issue. This is only true if people mindlessly re-run the code. The point of sharing it is so the code can be interrogated to see if there are quality issues. Your same argument could be made for sharing data; if people just blindly accept the data the replication issue would amplify. Yet we know that sharing the data is what led to uncovering some of the biggest issues in replication, and I don’t see many people defending hiding data as a contradiction in the publication process. I suspect it’s for the reasons others have already eluded to in this thread.
I'm not sure what you are saying. The peer review process works relatively well in the large majority of the scientific fields. There are problems but they are pretty anecdotal and are far from counterbalancing the advantages. The previous commenter was blaming the peer review process for "bad incentives that lead to bad science", but that is an incorrect analysis. The bad science in their field is mainly due to the fact that private interest and people with poor scientific culture are getting more easily into this field.
Also, let's not mix up "peer review" or "code sharing" and "bad publication" or "replication crisis".
I know people outside of science don't realise that, but publishing is only a very small element amongst the full science process. Scientists are talking together, exchanging all the time, at conferences, at workshops, ... This idea that a bad publication is fooling the domain experts does not correspond to reality. I can easily find a research paper mill and publish my made-up paper, but this would be 100% ignored by domain experts. Maybe one or two will have a look at the article, just in case, but it is totally wild to think that domain experts just randomly give a lot of credit to random unknown people rather than working with the groups of peers that they know well enough to know they are reliable. So, the percentage of "bad paper" is not a good metric: the percentage of bad papers is not at all representative of the percentage of bad papers that made it to the domain experts.
You seem to not understand the "replication crisis". The replication does not happens because the replicators are bad or the initial authors are cheating. There is a lot of causes, from the fact that science happens to the technology edge and that the technology edge is more tricky to reach, that the number of publications has increased a lot, that there is more and more economical interest trying to bias the system, to the stupid "publish or perish" + "publish only the good result" that everyone in the academic sector agree is stupid but exist because of non-academic people. If you publish scientifically interesting result that says "we have explored this way but found nothing", you have a lot of pressure from the non-academic people who are stupid enough to say that you have wasted money.
You seems to say "I saw a broken clock once, so it means that all clocks are broken and if you pretend it is not the case, it is just because a broken clock is still correct twice a day".
> This is only true if people mindlessly re-run the code. The point of sharing it is so the code can be interrogated to see if there are quality issues.
"Mindlessly re-running the code" is one extreme. "reviewing the code perfectly" is another one. Then there are all the scenario in the middle from "reviewing almost perfectly" to "reviewing superficially but having a false feeling of security". Something very interesting to mention is that in good practices, code review is part of software development, and yet, it does not mean that software have 0 bugs. Sure, it helps, and sharing the code will help too (I've said that already), but the question is "does it help more than the problem it may create". That's my point in this discussion: too many people here just don't understand that sharing the code create biases.
> Yet we know that sharing the data is what led to uncovering some of the biggest issues in replication,
What? What are your example of "replication crisis" where the problem "uncovered" by sharing the data? Do you mix up "replication crisis" and "fraud"? Even for "fraud", sharing the data is not really the solution, people who are caught are just being reckless and they could have easily faked their data in more subtle ways. On top of that, rerunning on the same data does not help if the conclusion is incorrect because of a statistical fluctuation in the data (at 95% confidence level, 5% of the paper can be wrong while they have 0 bugs, the data is indeed telling them that the most sensible conclusion is the one they have reached, and yet these conclusions are incorrect). On the other hand, rerunning on independent data is ALWAYS exposing a fraudster.
> and I don’t see many people defending hiding data as a contradiction in the publication process.
What do you mean? At CERN, sharing the data of your newly published paper with another collaboration is strictly forbidden. Only specific samples are allowed to be shared, after a lengthy approval procedure. But the point is that a paper should provide enough information that you don't need the data to discover if the methodology is sound or not.
I'm saying the peer review process is largely broken, both in the quality and quantity of publications. You have taken a somewhat condescending tone a couple times now to indicate you think you are talking to an audience unfamiliar with the peer review process, but you should know that the HN crowd goes far beyond professional coders. I am well aware of the peer review process, and publish and referee papers regularly.
>There are problems but they are pretty anecdotal
This makes me think you may not be familiar with the actual work in this area. It varies, but some domains show the majority (as many as 2/3rds) of studies have replication issues. The replication rates are lowest in complex systems, with 11% in biomedical being the lowest I'm aware of. Other domains have better rates, but not trivial and not anecdotal. Brian Nosek was one of the first that I'm aware of to systematically study this, but there are others. Data Colada focuses on this problem, and even they only talk about the studies that are generally (previously) highly regarded/cited. They don't even bother to raise alarms about the less consequential work they find problems with. So, no, this is not about me extrapolating from seeing "a broken clock once."
>it does not mean that software have 0 bugs
Anyone who regularly works with code knows this. But I think you're misunderstanding the intent of the code. It's not just for the referees, but the people trying to replicate it for their own purposes. As numerous people in this thread have said, replicating can be very hard. Good professors will often assign well-regarded papers to students to show them the results are often impossible to reproduce. Sharing code helps troubleshoot.
>So, the percentage of "bad paper" is not a good metric: the percentage of bad papers is not at all representative of the percentage of bad papers that made it to the domain experts.
This is a unnecessary moving of the goalposts. The thrust of the discussion is about the peer-review and publication process. Remember the title is "one of my papers got declined today" And now you seemingly admit that the publication process is broken, but it doesn't matter because experts won't be fooled. Except we have examples of Nobel laureates making mistakes with data (Daniel Kahneman), or high-caliber researchers sharing their own anecdotes (Tao and Grant) as well as fraudulent publications impacting millions of dollars of subsequent work (Alzheimers). My claim is that a good process should catch both low quality research and outright fraud. Your position is like an assembly line saying they don't have a problem when 70% of their widgets have to be thrown out because people at the end of the line can spot the bad widgets (even when they can't).
>What are your example of "replication crisis" where the problem "uncovered" by sharing the data?
Early examples would be dermatology studies for melanoma where simple bad practices were not followed, like balanced datasets. Or criminal justice studies that amplified racial biases or showed the authors didn't realize the temporal data was sorted by criminal severity. And yes, the most egregious examples are fraud, like the Dan Ariely case. That wasn't found until people went to the data source directly, rather than the researchers. But there are countless examples of p-hacking that could be found by sharing data. If your counter is that these are examples of people cheating recklessly and they could have been more careful, that doesn't make your case that the peer-review process works. It just means it's even worse.
>sharing the data of your newly published paper with another collaboration is strictly forbidden
Yup, and I'm aware of other domains that hide behind the confidentiality of their data as a way to obfuscate bad practices. But, in general, people assume sharing data is a good thing, just like sharing code should be.
>But the point is that a paper should provide enough information that you don't need the data to discover if the methodology is sound or not.
Again (this has been said before) the point in sharing is to aid in troubleshooting. Since we already said replication is hard, people need an ability to understand why the results differed. Is it because the replicator made a mistake? Shenanigans in the data? A bug in the original code? P-hacking? Is the method actually broken? Or is the method not as generalizable as the original authors led the reader to believe? Many of those are impossible to rule out unless the authors share their code and data.
You bring up CERN so consistently that I tend to believe you are looking at this problem through a straw and missing the larger context of rest of the scientific world. Yours reads as a perspective of someone inside a bubble.
I will not answer to everything because what's the point.
Yes, sharing the code can be one way to find bugs, I've said that already. Yes, sharing the code can help bootstrap another team, I've said that already.
What people don't realize is that reproducing from scratch the algorithm is also very very efficient. First, it's arguably a very good way to find bugs: if the other team does not have the exact same number as you, you can pinpoint exactly where you have diverged. When you find the reason, in the large majority of the case, it totally passed through several code reviewer. Reading a code thinking "does it make sense" is not an easy way to find bug, because bugs are usually in place where the code of the original author looked good when read.
And secondly, there is a contradiction in saying "people will study the code intensively" and "people will go faster because they don't have to write the code".
> Remember the title is "one of my papers got declined today"
Have you even read what Tao says? He explains that he himself have rejected papers and has probably generated similar apparently paradoxical situations. His point is NOT that there is a problem with paper publication, it is that paper rejection is not such a big deal.
For the rest, you keep mixing up "peer review", "code sharing", "replication crisis", ... and because of that, your logic just make 0 sense.
I say "bad paper that turns out to have errors (involuntary or not) are anecdotal" and you answer "11% of the biomedical publication have replication problem". Then when I ask you to give example where the replication crisis was avoided by sharing the data, you talk about bad papers that turns out to have errors (involuntary or not).
And, yes, I used CERN as an example because 1) I know it well, 2) if what you say is correct, how on hell CERN is not bursting with fire right now? You are pretending that sharing code or sharing data is a good idea and part of good practice. If it is true, how do you explain that CERN forbid it and still is able to generate really good papers. According to you, CERN would even be an exception where replication crisis, bad paper and peer-review problem is almost existent (and therefore I got the wrong idea). But if it is the case, how do you explain that: despite not doing what you pretend will help avoiding those, CERN does BETTER?!
But by the way, at uni, I became very good friend with a lot of people. Some of them scientists in other discipline. We regularly have this kind of discussion because it is interesting to compare our different world. The funny part is that I did not really think of how sharing the code or the data is not such a big deal after (it still can be good, but it's not "the good practice"), I realise it because another person, a chemist, mentioned it.
>What people don't realize is that reproducing from scratch the algorithm is also very very efficient.
This is where we differ. Especially if the author shares neither the data or the code, because you can never truly be sure it's a software bug or a data anomaly or a bad method or outright fraud. So you can end up burning tremendous amounts of time investigating all those avenues. That statement (as well as others about how trivial replication is) makes me think you don't actually try to replicate anything yourself.
>there is a contradiction in saying "people will study the code intensively" and "people will go faster because they don't have to write the code".
I never said "people will go faster" because they don't have to write the code. Maybe you're confusing me with another poster. You were the one who said sharing code is worthless because people can "click on the button and you get the same result". My point, and maybe this is where we differ, is that for the ultimate goal is not to create the exact same results. The goal I'm after is to apply the methodology to something else useful. That's why we share the work. When it doesn't seem to work, I want to go back to the original work to figure out why. The way you talk about the publication process tells me you don't do very much of this. Maybe that's because of your work at CERN is limited in that regard, but when I read interesting research I want to apply it to different data that are relevant to the problems I'm trying to solve. This is the norm outside of those who aren't studying the replication crisis directly.
>I say "bad paper that turns out to have errors (involuntary or not) are anecdotal"
My answer was not conflating peer-review and code sharing and replication (although I do think they are related). My answer was to give you researchers who work in this area because their work shows it is far from anecdotal. My guess is you didn't bother to look it up because you've already made up your mind and can't be bothered.
>I ask you to give example where the replication crisis was avoided by sharing the data, you talk about bad papers that turns out to have errors
Because it's a bad question. A study that is replicated using the same data is "avoiding the replication crisis". Did you really want me to list studies that have been replicated? Go on Kaggle or Figshare or Genbank if you want example of datasets that have been used (and replicated), like CORD-19 or NIH-dbGaP or World Values Survey or any host of other datasets. You can find plenty of published studies that use that data and try to replicate them yourself.
>how on hell CERN is not bursting with fire
The referenced authors talk about how physics is generally the most replicable. This is largely because they have the most controlled experimental setups. Other domains that do much worse in terms of replicability are hampered by messier systems, ethical considerations, etc. that limit the scientific process. In the larger scheme of things, physics is more of an anomaly and not a good basis to extrapolate to the state of affairs for science as a whole. I tend to think you being in a bubble there has caused you to over-extrapolate and have too strong of a conclusion. (You should also review the HN guidelines that urge commenters to avoid using caps for emphasis)
>"sharing the code...but it's not "the good practice""
I'm not sure if you think sharing a single unsourced quip is convincing but, your anecdotal discussion aside, lots of people disagree with you and your chemist friend. Enough so that it's become a more and more common practice (and even requirement in some journals) to share data and code. Maybe that's changed since your time at uni, and probably for the better.
> Especially if the author shares neither the data or the code
What are you talking about. In this example, why do you invent they are not sharing the data? That's the whole point.
> A study that is replicated using the same data is "avoiding the replication crisis"
BULLSHIT. You can build confidence by redoing the experience with the same data, but it is just ONE PART and it is NOT ENOUGH. If there is a statistical fluctuation in the data, both studies will conclude something false.
I have of course reproduced a lot of algorithm myself, without having the code. It's not complicated, the paper explains what you need to do (and please, if your problem is that the paper does not explain, then the problem is not about sharing the code, it's about paper badly explaining).
And again, my argument is "nobody share data" (did you know that some study also shares code? Did you know that I have occasionally shared code? Because, as I've said before, it can be useful), but that "some don't share data and yet are still doing very good, both on performance, on fraud detection or on replication".
For the rest, you are just saying "my anecdotal observations are better than yours".
But meanwhile, even Terence Tao does not say what you pretend he says, so I'm sure you believe people agree with you, but it does not mean they do.
Please review and adhere to the HN guidelines before replying again.
>why do you invent they are not sharing the data?
Because you advocated that very point. You: "some data is better not to share too" The point in sharing is that I want to interrogate your data/code to see if it's biased or misrepresented or prone to error if it doesn't seem to work for the specialized problem I am trying to apply it to. When you don't share it and your problem doesn't replicate, I'm left wondering "Is it because they have something unique in their dataset that doesn't generalize to my problem?"
>BULLSHIT.
Please review and adhere to the HN guidelines before replying again.
>It's not complicated
You can make this general claim about all papers based on your individual experience? I've already explained why your personal experience is probably not generalizable across all domains.
>you are just saying "my anecdotal observations are better than yours".
No, I'm saying the systematically studied, published, and replicated studies trump your anecdotal claims. I've given you some example authors, if you have an issue with their methods, delineate the problems explicitly rather than sharing weak anecdotes.
> Because you advocated that very point. You: "some data is better not to share too"
SOME data. SOME. You've concluded, incorrectly, that I was pretending that sharing data is not useful all the time, which is not at all what I've said.
> You can make this general claim about all papers based on your individual experience?
What? Do you even understand basic logic? I'm saying that I've observed SOME paper where sharing the code did not help. I'm not saying sharing the code never help (I've said that already). I'm just saying that people usually don't understand the real cause of the problem, and invent that sharing the code will help, while in fact doing other things (for example being more precise in the explanation) will solve the problem without having to pay for the unblinding that sharing the code generate.
Sure, one reason I say that is because of my experience, even if my observations are not at all limited to one field as I've exchanged on the subject with many scientists. But another reason is that when I discuss the subject, the people who overestimate the gain of sharing the code really have difficulties to understand the disadvantages in sharing the code.
Yourself, you seems to not understand what we need for a good replication. Replication is supposed to independently demonstrate, so we build up the confidence in the conclusions. Rerunning with the same data or the same code is not enough, because it does not prove that the conclusions will remain valid if we try with other data or other implementation. When you understand that, only then you understand that sharing the code has a price to pay.
By the way, it will also explain why CERN is doing something that, according to you, has absolutely no reason to exist except for cheating. Of course, if it was the case, intellectually honest scientists would all ask CERN to cancel these policies. They don't, because there are real reasons why scientists may prefer in some case to forbid sharing code and data (not just "I don't do it myself because I'm lazy", but "I don't do it because it's a specific rule, they explicitly say it's a bad thing to do it").
And, sure, maybe it is not everywhere. But it does not matter. It's a counter-example that demonstrates that your hypothesis does not work. If your hypothesis was true, what CERN does would not be possible, it would be obviously a bad move and would be attacked.
> I've given you some example authors, if you have an issue with their methods, delineate the problems explicitly rather than sharing weak anecdotes.
These studies do not conclude that sharing the code is a good solution. None of these studies are in contradiction with what I say.
Of course, from someone who think that saying "some data is better not to share too" and conclude that it means "data is better to never be shared", or that did not understood the point of Tao, I'm sure you are convinced they say that. They just don't.
> It's not "the best we have", it's "the best those in power will allow". Those in power do not want consequences for publishing bad research, and also don't want the reviewing load required to keep bad research out.
This is a very conspiratorial view of things. The simple and true answer is your last suggestion: doing a more thorough review takes more time than anyone has available.
Reviewers work for free. Applying the level of scrutiny you're requesting would require far more work than reviewers currently do, and maybe even something approaching the amount of work required to write the paper in the first place. The more work it takes to review an article, the less willing reviewers are to volunteer their time, and the harder it is for editors to find reviewers. The current level of scrutiny that papers get at the peer-review stage is a result of how much time reviewers can realistically volunteer.
Peer review is a very low standard. It's only an initial filter to remove the garbage and to bring papers up to some basic quality standard. The real test of a paper is whether it is cited and built upon by other scientists after publication. Many papers are published and then forgotten, or found to be flawed and not used any more.
If journals were operating on a shoestring budget, I might be able to understand why academics are expected to do peer review for free. As it is, it makes no sense whatsoever. Elsevier pulls down huge amounts of money and still manages to command free labor.
I guess the sensible response is "what bias does being paid by Elsevier add that working for free for Elsevier doesn't add?"
The external bias is clear to me (maybe a paper undermines something you're about to publish, for example) but I honestly can't see much additional bias in adding cash to a relationship that already exists.
>The real test of a paper is whether it is cited and built upon by other scientists after publication. Many papers are published and then forgotten, or found to be flawed and not used any more.
This does seem true, but this forgets the downstream effects of publishing flawed papers.
Future research in this area is stymied by reviewers who insist that the flawed research already solved the problem and/or undermines the novelty of somewhat similar solutions that actually work.
Reviewers will reject your work and insist that you include the flawed research in your own evaluations, even if you’ve already pointed out the flaws. Then, when you show that the flawed paper underperforms every other system, reviewers will reject your results and ask you why they differ from the flawed paper (no matter how clearly you explain the flaws) :/
Published papers are viewed as canon by reviewers, even if they don’t work at all. It’s very difficult to change this perception.
If you get such a simple-minded reviewer, you can push back in your response, or you can even contact the editor directly.
Reviewers are not all-powerful, and they don't all share the same outlook. After all, reviewers are just scientists who have published articles in the past. If you are publishing papers, you're also reviewing papers. When you review papers, will you assume that everything that has ever passed peer review is true? Obviously not.
I wasn't familiar with with this requirement but how close is close? I don't think the new semiconductor plants in Texas & Arizona are that close to a city center.
And how expensive is too expensive? At least compared to cities of similar size & stature, Chicago itself is not that expensive and can get much cheaper as you move slightly away from it (in the right direction).
Even large, damaging tornadoes have quite localized impacts (max of maybe a mile in path width) -- and you don't generally do much more than stay up to building code in order to prepare for one. In contrast, earthquakes devastate entire areas and require substantial changes to building construction in order to protect against them.
Tornado protection is mostly about avoiding damage in the periphery of a tornado. A building of that size can't realistically be protected from a direct hit of a major tornado. Proper engineering can protect a building from basically all earthquakes. Whether the contents inside are secured properly is a different matter and that's where most damage occurs.
you don't generally do much more than stay up to building code in order to prepare for one.
Interesting. I'm unfamiliar with building code provisions that are designed to mitigate the effects of tornadoes, which are arguably the most destructive force on Earth apart from an erupting volcano or a nuclear attack, on a semiconductor fab, which is arguably among the most sensitive and easily-disrupted facilities ever built. Any good sources for further reading?