Hacker Newsnew | past | comments | ask | show | jobs | submit | currymj's commentslogin

I recommend actually clicking through and reading some of these papers.

Most of those I spot checked do not give an impression of high quality. Not just AI writing assistance but many seem to have AI-generated "ideas", often plausible nonsense. the reviewers often catch the errors and sometimes even the fake citations.

can I prove malfeasance beyond a reasonable doubt? no. but I personally feel quite confident many of the papers I checked are primarily AI-generated.

I feel really bad for any authors who submitted legitimate work but made an innocent mistake in their .bib and ended up on the same list as the rest of this stuff.


To me such an interpretation suggests there are likely to be papers that were not so easy to spot, perhaps because the AI accidentally happened upon more plausible nonsense and then generated fully non-sense data, which was believable but still (at a reduced level of criticality) nonsense data, to bolster said non-sense theory at a level that is less easy to catch.

This isn't comforting at all.


the papers themselves are publicly available online too. Most of the ones I spot-checked give the extremely strong impression of AI generation.

not just some hallucinated citations, and not just the writing. in many cases the actual purported research "ideas" seem to be plausible nonsense.

To get a feel for it, you can take some of the topics they write about and ask your favorite LLM to generate a paper. Maybe even throw "Deep Research" mode at it. Perhaps tell it to put it in ICLR latex format. It will look a lot like these.


bigram-trigram language models (with some smoothing tricks to allow for out-of-training-set generalization) were state of the art for many years. Ch. 3 of Jurafsky's textbook (which is modern and goes all the way to LLMs, embeddings etc.) is good on this topic.

https://web.stanford.edu/~jurafsky/slp3/ed3book_aug25.pdf

I don't know the history but I would guess there have been times (like the 90s) when the best neural language models were worse than the best trigram language models.


Steam Deck is Arch-based, that's most likely why.


SteamOS is counted separately.


Yes, but a lot of linux games use an Arch distribution such as CachyOS since SteamOS also is. They get updates faster because of the rolling releases.


i would like to understand what people get, or think they get, out of putting a completely AI-generated survey paper on arXiv.

Even if AI writes the paper for you, it's still kind of a pain in the ass to go through the submission process, get the LaTeX to compile on their servers, etc., there is a small cost to you. Why do this?


Gaming the h-index has been a thing for a long time in circles where people take note of such things. There are academics who attach their name to every paper that goes through their department (even if they contributed nothing), there are those who employ a mountain of grad students to speed run publishing junk papers... and now with LLMs, one can do it even faster!


Presumably a sense of accomplishment to brandish with family and less informed employers.


Yup, 100% going on a linked in profile


Published papers are part of the EB-1 visa rubric so huge value in getting your content into these indexes:

"One specific criterion is the ‘authorship of scholarly articles in professional or major trade publications or other major media’. The quality and reputation of the publication outlet (e.g., impact factor of a journal, editorial review process) are important factors in the evaluation”


Is arXiv a major trade publication?

I've never seen arXiv papers counted towards your publications anywhere that the number of your publications are used as a metric. Is USCIS different?


if you run into anyone who is serious about the "fake Nobel" complaint, just ask them why they have such a high opinion of taking money from an arms dealer, and such a low opinion of taking money from a public institution in one of the Nordic welfare states.


for many types of scientific computing, there's a case to be made it is the best language available. often this type of computing would be in scientific/engineering organizations and not in most software companies. this is its best niche, an important one, but not visible to people with SWE jobs making most software.

it can be used for deep learning but you probably shouldn't, currently, except as a small piece of a large problem where you want Julia for other reasons (e.g. scientific machine learning). They do keep improving this and it will probably be great eventually.

i don't know what the experience is like using it for traditional data science tasks. the plotting libraries are actually pretty nicely designed and no longer have horrible compilation delays.

people who like type systems tend to dislike Julia's type system.

they still have the problem of important packages being maintained by PhD students who graduate and disappear.

as a language it promises a lot and mostly delivers, but those compromises where it can't deliver can be really frustrating. this also produces a social dynamic of disillusioned former true believers.


I work in the medical device industry and most people on my team have engineering degrees and extensive experience with Matlab. Pretty much all of them would flip their table if they had to write numerical/scientific code in Rust, even though it arguably has a more robust type system.


> people who like type systems tend to dislike Julia's type system.

This is true. As far as I understand it, there is not a type theory basis for Julia's design (type theory seems to have little to say about subtyping type lattices). Relatedly, another comment mentioned that Julia needs sum types.


It is the same type theory that has powered Common Lisp and Dylan.


We're not using "type theory" the same way, I think. I'm thinking in terms of

    - simply typed lambda calculus
    - System F
    - dependent type theory (MLTT)
    - linear types
    - row types
    - and so on
But it's subtle to talk about. It's not like there is a single type theory that underlies Typescript or Rust, either. These practical languages have partial, (and somewhat post-hoc) formalizations of their systems.


For starters,

"On the use of LISP in implementing denotational semantics"

https://dl.acm.org/doi/10.1145/319838.319866

Type theory in CS isn't a synonymous with whatever Haskell happens to do.


If you read his last two novels he was clearly well-informed about mathematics.

However I have to assume that McCarthy didn't actually master all the material in the math books mentioned here, I think the reporter may be a little too credulous about that. I suspect he had the very common experience of buying a yellow book and being defeated in the first couple chapters.


That might be an understatement of his capabilities, though obviously he wasn't a professional mathematician. It's a joy to read some of the eulogies professors at the Santa Fe Institute gave to him:

https://www.santafe.edu/news-center/news/memoriam-cormac-mcc...

> He had an encyclopedic knowledge of the world and a memory to match. Topics ranged from salvage diving — something we discussed a few days ago — to far more academic fare often focused on mathematics and physics.

> Cormac and I engaged on a wide range of topics. Some recurring themes included social mobility, machine intelligence, the intersection of genius and madness, and cars and trucks.

> Cormac also often remarked that a lively conversation with friends is about as good as sex. He’d talk for hours about physics, math, novels, philosophy, human nature, bawdy humor, corny humor, architecture (including detailed advice on my own house), gambling, history, and any question that lacked a quick and obvious answer.

Etc.


Thanks very much for posting this link. The eulogies were indeed a pleasure to read. Choked up a few times.


> His books, many of which are annotated with margin comments,

I'm not saying that he did, but this along with being the right age to have read How to Read a Book by Mortimer J. Adler strongly suggest that he used that book to grasp a lot more of his books than most people can.

That book gives you a very good strategy for reading books that are beyond you normally. In the three years since I've read it I've managed to finish books that I couldn't read even when I was doing my PhD and it was my full time job to understand them.

The funny thing is that I only ran into that book when I was trying to figure out how to build knowledge graphs for complex documents using LLMs. Using multiple readings to create a summary of each chunk, then a graph of the connections between the chunks, then a glossary of all the terms and finally a critique of each chunk gave better than sota results for the documents I was working on.


Where can I read more about your research? Knowledge graphs interest me.


Drop me a line on my profiles email.

I'm playing around with using hyperlinks in pdfs to get around how much the www sucks for posting serious research with serious working code.

Caveat emptor: I'm first working on getting the basic groundwork out, like a pipeline that shows what you need to do to extract a scanned pdf in a quality that tesseract can actually get text out of.


> I'm first working on getting the basic groundwork out, like a pipeline that shows what you need to do to extract a scanned pdf in a quality that tesseract can actually get text out of.

Sounds like TBL's contribution doesn't suck so much after all.


I'm curious why you think that "strongly suggests" anything?


> However I have to assume that McCarthy didn't actually master all the material in the math books mentioned here

Why? And what exactly would mastery look like? Regardless, McCarthy didn't make his mark as a mathematician so his private ability to understand doesn't matter. Why take the opportunity to make a negative assumption and diminish the possibility that he had mastered an understanding in his private life? What does this accomplish? Seems like the only thing it could possibly do is to try and make you and me feel better about our own inadequacies, without proof.


I don't think it diminishes him.

"Stella Maris" is a great novel that could only be written by someone who was very knowledgeable about math. As far as art that engages deeply with math and science, I don't know of anything comparable. Most artists would focus only on the human drama of discovery, without being able to engage with the subject matter.

However, I would consider "mastery" of a math textbook to be you have worked through almost all the chapters, can do a reasonable chunk of the exercises, and could TA the course without too much trouble.

While I don't know for sure, I doubt McCarthy achieved that level of understanding for all the yellow books he owned. I think buying a math textbook on an interesting topic and then not making it very far is a very common and human experience.


Agree that “Stella Maris” is amazing for this deep engagement with art. Perhaps in a similar vein I do think there are a couple of other books that do this . One is Anathem by Neal Stephenson, which is similar in that foundations of math makes an appearance. The other is “The Weyl Conjectures” by Karen Olson, which captures what it’s like to really do mathematics. Highly recommend both.


sign languages are completely different languages from spoken languages, with their own grammar etc.

subtitles can work but it's basically a second language. perhaps comparable to many countries where people speak a dialect that's very different from the "standard" written language.

this is why you sometimes have sign language interpreters at events, rather than just captions.

there's not really a widely accepted written form of sign language.


> subtitles can work but it's basically a second language

That argument applies just as equally to sign language - most countries have their own idiosyncratic sign language. (ASL, LSE, etc.). Any televised event that has interpreters will be using the national language version.

The closest thing you're thinking of is IS - International Sign but its much more limited in terms of expression and not every deaf person knows it.

> there's not really a widely accepted written form of sign language.

Because it makes no sense to have it unless there was a regional deaf community that was fluent in sign language and also simultaneously illiterate.

https://www.reddit.com/r/NoStupidQuestions/comments/6t7k1w/h...


>this is why you sometimes have sign language interpreters at events, rather than just captions.

No, the reason is because a) it's in real time, and b) there's no screen to put the subtitles on. If it was possible to simply display subtitles on people's vision, that would be much more preferable, because writing is a form of communication more people are familiar with than sign language. For example, someone might not be deaf, but might still not be able to hear the audio, so a sign language interpreter would not help them at all, while closed captions would.


if you're maximizing accessibility you'd have both. often in broadcasts with closed captioning, there will also be a video of the sign language interpreter.


Sometimes the captions miss things or are really terribly written.


I also can't think of a reason why I would ever want to look at an AI generated video.

however as they hint at a little in the announcement, if video generation becomes good enough at simulating physics and environments realistically, that's very interesting for robotics.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: