Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There are quite a few relatively objective criteria in the real world: real estate holdings, money and material possessions, power to influence people and events, etc.

The complexity of achieving those might result in the "Centaur Era", when humans+computers are superior to either alone, lasting longer than the Centaur chess era, which spanned only 1-2 decades before engines like Stockfish made humans superfluous.

However, in well-defined domains, like medical diagnostics, it seems reasoning models alone are already superior to primary care physicians, according to at least 6 studies.

Ref: When Doctors With A.I. Are Outperformed by A.I. Alone by Dr. Eric Topol https://substack.com/@erictopol/p-156304196



It makes sense. People said software engineers would be easy to replace with AI, because our work can be run on a computer and easily tested, but the disconnect is that the primary strength of LLMs is that they can draw on huge bodies of information, and that's not the primary skill programmers are paid for. It does help programmers when you're doing trivial CRUD work or writing boilerplate, but every programmer will eventually have to be able to actually truly reason about code, and LLMs fundamentally cannot do that (not even the "reasoning" models).

Medical diagnosis relies heavily on knowledge, pattern recognition, a bunch of heuristics, educated guesses, luck, etc. These are all things LLMs do very well. They don't need a high degree of accuracy, because humans are already doing this work with a pretty low degree of accuracy. They just have to be a little more accurate.


Being a walking encyclopedia is not what we pay doctors for either. We pay them to account for the half truths and actual lies that people tell about their health. This is to say nothing about novel presentations that come about because of the genetic lottery. Same as an AI can assist but not replace a software engineer, an AI can assist but not replace a doctor.


Having worked briefly in the medical fields in the 1990s, there is some sort of "greedy matching" being pursued, so once 1-2 well-known symptoms are recognized that can be associated with diseases, the standard interventions to cure are initiated.

A more "proper" approach would be to work with sets of hypotheses and to conduct tests to exclude alternative explanations gradually - which medics call "DD" (differential diagnosis). Sadly, this is often not systematically done, and instead people jump on the first diagnosis and try if the intervention "fixes" things.

So I agree there are huge gains from "low hanging fruits" to be expected in the medical domain.


I think at this point it's an absurd take that they aren't reasoning. I don't think without reasoning about code (& math) you can get to such high scores on competitive coding and IMO scores.

Alphazero also doesn't need training data as input--it's generated by game-play. The information fed in is just game rules. Theoretically should also be possible in research math. Less so in programming b/c we care about less rigid things like style. But if you rigorously defined the objective, training data should also be not necessary.


> Alphazero also doesn't need training data as input--it's generated by game-play. The information fed in is just game rules

This is wrong, it wasn't just fed the rules, it was also fed a harness that did test viable moves and searched for optimal ones using a depth first search method.

Without that harness it would not have gained superhuman performance, such a harness is easy to make for Go but not as easy to make for more complex things. You will find the harder it is to make an effective such harness for a topic the harder it is to solve for AI models, it is relatively easy to make a good such harness for very well defined programming problems like competitive programming but much much harder for general purpose programming.


Are you talking about Monte Carlo tree search? I consider it part of the algorithm in AlphaZero's case. But agreed that RL is a lot harder in real-life setting than in a board game setting.


the harness is obtained from the game rules? the "harness" is part of the algorithm of alphzero


> the "harness" is part of the algorithm of alphzero

Then that is not a general algorithm and results from it doesn't apply to other problems.


If you mean CoT, it's mostly fake https://www.anthropic.com/research/reasoning-models-dont-say...

If you mean symbolic reasoning, well it's pretty obvious that they aren't doing it since they fail basic arithmetic.


> If you mean CoT, it's mostly fake

If that's your take-away from that paper, it seems you've arrived at the wrong conclusion. It's not that it's "fake", it's that it doesn't give the full picture, and if you only rely on CoT to catch "undesirable" behavior, you'll miss a lot. There is a lot more nuance than you allude to, from the paper itself:

> These results suggest that CoT monitoring is a promising way of noticing undesired behaviors during training and evaluations, but that it is not sufficient to rule them out.


very few humans are as good as these models at arithmetic. and CoT is not "mostly fake" that's not a correct interpretation of that research. It can be deceptive but so can human justifications of actions.


Humans can learn the symbolic rules and then apply them correctly to any problem, bounded only by time, and modulo lapses of concentration. LLMs fundamentally do not work this way, which is a major shortcoming.

They can convincingly mimic human thought but the illusion falls flat at further inspection.


What? Do you mean like this??? https://www.reddit.com/r/OpenAI/comments/1mkrrbx/chatgpt_5_h...

Calculators have been better than humans at arithmetic for well over half a century. Calculators can reason?


It's an absurd take to actually believe they can reason. The cutting edge "reasoning model," by the way:

https://bsky.app/profile/kjhealy.co/post/3lvtxbtexg226


Humans are statistically speaking static. We just find out more about them but the humans themselves don't meaningfully change unless you start looking at much longer time scales. The state of the rest of the world is in constant flux and much harder to model.


I’m not sure I agree with this - it took humans about a month to go from “wow this AI generated art is amazing” to “zzzz it’s just AI art”.


To be fair, it was more a "wow look what the computer did". The AI "art" was always bad. At first it was just bad because it was visually incongruous. Then they improved the finger counting kernel, and now it's bad because it's a shallow cultural average.

AI producing visual art has only flooded the internet with "slop", the commonly accepted term. It's something that meets the bare criteria, but falls short in producing anything actually enjoyable or worth anyone's time.


It sucks for art almost by definition, because art exists for its own reason and is in some way novel.

However, even artists need supporting materials and tooling that meet bare criteria. Some care what kind of wood their brush is made from, but I'd guess most do not.

I suspect it'll prove useless at the heart of almost every art form, but powerful at the periphery.


That's culture, not genetics.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: