> Most of my life post university I realized most of questions have complex answers, it is never as simple as you expect.
I find the complication comes from poor definitions, poor understanding of those definitions, and pedantic arguments. Less about the facts of reality being complicated and more about our ability to communicate it to each other.
There's a great analog with this in chess as well.
~1200 - omg chess is so amazing and hard. this is great.
~1500 - i'm really starting to get it! i can beat most people i know easily. i love studying this complex game!
~1800 - this game really isn't that hard. i can beat most people at the club without trying. really I think the only thing separating me from Kasparov is just a lot of opening prep and study
~2300 - omg this game is so friggin hard. 2600s are on an entirely different plane, let alone a Kasparov or a Carlsen.
Magnus Carlsen - "Wow, I really have no understanding of chess." - Said without irony after playing some game and going over it with a computer on stream. A fairly frequent happening.
IMO both perspectives have their place. Sometimes what's missing is the information, sometimes what's lacking is the ability to communicate it and/or the willingness to understand it. So in different circumstances either viewpoint may be appropriate.
What's missing more often than not, across fields of study as well as levels of education, is the overall commitment to conceputal integrity. From this we observe people's habitual inability or unwillingness to be definite about what their words mean - and their consequent fear of abstraction.
If one is in the habit of using one's set of concepts in the manner of bludgeons, one will find many ways and many reasons to bludgeon another with them - such as if a person turned out to be using concepts as something more akin to clockwork.
Simple counterexample: chess. The rules are simple enough we regularly teach them to young children. There's basically no randomness involved. And yet, the rules taken together form a game complex enough that no human alive can fully comprehend their consequences.
This is actually insightful: we usually don't know the question we are trying to answer. The idea that you can "just" find the right question is naive.
Sure, you can put it this way, with the caveat that reality at large isn't strongly definable.
You can sort of see this with good engineering: half of it is strongly defining a system simple enough to be reasoned about and built up, the other half is making damn sure that the rest of reality can't intrude, violate your assumptions and ruin it all.
> the content of the papers themselves are not necessarily invalidated. For example, authors may have given an LLM a partial description of a citation and asked the LLM to produce bibtex (a formatted reference)
Maybe I'm overreacting, but this feels like an insanely biased response. They found the one potentially innocuous reason and latched onto that as a way to hand-wave the entire problem away.
Science already had a reproducibility problem, and it now has a hallucination problem. Considering the massive influence the private sector has on the both the work and the institutions themselves, the future of open science is looking bleak.
I really think it is. The primary function of these publications is to validate science. When we find invalid citations, it shows they're not doing their job. When they get called on that, they cite the volume of work their publication puts out and call out the only potential not-disqualifying outcome.
Seems like CYA, seems like hand wave. Seems like excuses.
Even if some of those innocuous mistakes happen, we'll all be better off if we accept people making those mistakes as acceptable casualties in an unforgiving campaign against academic fraudsters.
It's like arguing against strict liability for drunk driving because maybe somebody accidentally let their grape juice sit to long and they didn't know it was fermented... I can conceive of such a thing, but that doesn't mean we should go easy on drunk driving.
Isn't disqualifying X months of potentially great research due to a misformed, but existing reference harsh? I don't think they'd be okay with references that are actually made up.
> When your entire job is confirming that science is valid, I expect a little more humility when it turns out you've missed a critical aspect.
I wouldn't call a misformed reference a critical issue, it happens. That's why we have peer reviews. I would contend drawing superficially valid conclusions from studies through use of AI is a much more burning problem that speaks more to the integrity of the author.
> It will serve as a reminder not to cut any corners.
Or yet another reason to ditch academic work for industry. I doubt the rise of scientific AI tools like AlphaXiv [1], whether you consider them beneficial or detrimental, can be avoided - calling for a level pragmatism.
even the fact that citations are not automatically verified by the journal is crazy, the whole academia and publishing enterprise is an empire built on inefficiency, hubris, and politics (but I'm repeating myself).
Science relies on trust.. a lot. So things which show dishonesty are penalised greatly. If we were to remove trust then peer reviewing a paper might take months of work or even years.
And that timeline only grows with the complexity of the field in question. I think this is inherently a function of the complexity of the study, and rather than harshly penalizing such shortcomings we should develop tools that address them and improve productivity. AI can speed up the verification of requirements like proper citations, both on the author's and reviewer's side.
Math does that. Peer review cycles are measured in years there. This does not stop fashionable subfields from publishing sloppy papers, and occasionally even irrecoverably false ones.
I don’t read the NeurIPS statement as malicious per se, but I do think it’s incomplete
They’re right that a citation error doesn’t automatically invalidate the technical content of a paper, and that there are relatively benign ways these mistakes get introduced. But focusing on intent or severity sidesteps the fact that citations, claims, and provenance are still treated as narrative artifacts rather than things we systematically verify
Once that’s the case, the question isn’t whether any single paper is “invalid” but whether the workflow itself is robust under current incentives and tooling.
A student group at Duke has been trying to think about with Liberata, i.e. what publishing looks like if verification, attribution, and reproducibility are first class rather than best effort
They have a short explainer here that lays out the idea if useful context helps: https://liberata.info/
I found at least one example[0] of authors claiming the reason for the hallucination was exactly this. That said, I do think for this kind of use, authors should go to the effort of verifying the correctness of the output. I also tend to agree with others who have commented that while a hallucinated citation or two may not be particularly egregious, it does raise concerns about what other errors may have been missed.
> There is another fallacy in play where people pushing these debates want you to think that there is only one single cause of CVD or health issues: Either sugar, carbs, fat, or something else. The game they play is to point the finger at one thing and imply that it gets the other thing off the hook. Don’t fall for this game.
Okay but right now we're talking about science getting corrupted by money. Which did happen in this instance, so that companies could hide the damage that sugar does to people.
Sugar does damage and scientists were paid to downplay that fact. It is not the first time. This is concerning when we talk about principles and public trust.
> The simple evidence for this is that everyone who has invested the same resources in AI has produced roughly the same result. OpenAI, Anthropic, Google, Meta, Deepseek, etc. There's no evidence of a technological moat or a competitive advantage in any of these companies.
I think this is analysis is too surface level. We are seeing Google Gemini pull away in terms of image generation, and their access to billions of organic user images gives them a huge moat. And in terms of training data, Google also has a huge advantage there.
The moat is the training data, capital investment, and simply having a better AI that others cannot recreate.
These kinds of parternships also throw in free inference with MFN clauses, which make a mutual moat.
A moat doesn't have to be a feature, and equity stakes have been fairly successful moats (eg. Much of AWS's ML services being powered by Anthropic models due to their equity stake in Anthropic).
A moat is a permanent feature protecting a castle against attack. That’s the metaphor. If it’s not their own device intrinsically protecting them then it’s not a moat in my book.
> That is not how we use the term "moat" in this context, because competitors eventually converge on offerings within 1-2 years.
Then I guess we need a new term because that's not how I interpret the term moat either. To me, ChatGPT chat history is a moat. It allows them to differentiate their product and competitors cannot copy it. If someone switches to a new AI service they will have to build their chat history from scratch.
By comparison a business deal that can be transferred to a new partner the second it expires is much more temporary.
I find the complication comes from poor definitions, poor understanding of those definitions, and pedantic arguments. Less about the facts of reality being complicated and more about our ability to communicate it to each other.
reply