Hacker Newsnew | past | comments | ask | show | jobs | submit | sigbottle's commentslogin

GPT models definitely seem stronger when they "get it" and in the types of problems they "get", while claude seems more holistic but not "as smart" as some of the spikes GPT can get.

I'm a feyerabend sympathizer, but even he wouldn't have gone this far.

He was against establishment dogma, not pro-anti intellectualism.


Well, you can technically scurry around this by saying, "Okay, there are a class of situations, and we just need to figure out the cases because yes we acknowledge that morality is tricky". Of course, take this to the limit and this is starting to sound like pragmatism - what you call as "well, we're making a more and more accurate absolute model, we just need to get there" versus "revising is always okay, we just need to get to a better one" blurs together more and more.

IMO, the 20th century has proven that demarcation is very, very, very hard. You can take either interpretation - that we just need to "get to the right model at the end", or "there is no right end, all we can do is try to do 'better', whatever that means"

And to be clear, I genuinely don't know what's right. Carnap had a very intricate philosophy that sometimes seemed like a sort of relativism, but it was more of a linguistic pluralism - I think it's clear he still believed in firm demarcations, essences, and capital T Truth even if they moved over time. On the complete other side, you have someone like Feyerabend, who believed that we should be cunning and willing to adopt models if they could help us. Neither of these guys are idiots, and they're explicitly not saying the same thing (a related paper can be found here https://philarchive.org/archive/TSORTC), but honestly, they do sort of converge at a high level.

The main difference in interpretation is "we're getting to a complicated, complicated truth, but there is a capital T Truth" versus "we can clearly compare, contrast, and judge different alternatives, but to prioritize one as capital T Truth is a mistake; there isn't even a capital T Truth".

(technically they're arguing different axes, but I think 20th century philosophy of science & logical positivsm are closely related)

(disclaimer: am a layman in philosophy, so please correct me if I'm wrong)

I think it's very easy to just look at relativsm vs absolute truth and just conclude strawmen arguments about both sides.

And to be clear, it's not even like drawing more and more intricate distinctions is good, either! Sometimes the best arguments from both sides are an appeal back to "simple" arguments.

I don't know. Philosophy is really interesting. Funnily enough, I only started reading about it more because I joined a lab full of physicists, mathematicians, and computer scientists. No one discusses "philosophy proper", as in following the historical philosophical tradition (no one has read Kant here), but a lot of the topics we talk about are very philosophy adjacent, beyond very simple arguments


Does it?

For me, I've had that mentality for the longest time and I didn't get anything done because, well, "I'm just average".

For me, a little bit of arrogance (there's no way I couldn't do X, let's go do it), even if I end up "looking stupid" (see, I told you it was that hard!), was far more valuable to my development


For me, I've realized I often cannot possibly learn something if I can't compare it to something prior first.

In this case, as another user mentioned, the decoupling use case is a great one. Instead of two processes/API directly talking, having an intermediate "buffer" process/API can save you headache


To add to this,

The concept of connascence, and not coupling is what I find more useful for trade off analysis.

Synchronous connascence means that you only have a single architectural quanta under Neil Ford’s terminology.

As Ford is less religious and more respectful of real world trade offs, I find his writings more useful for real world problems.

I encourage people to check his books out and see if it is useful. It was always hard to mention connascence as it has a reputation of being ivory tower architect jargon, but in a distributed system world it is very pragmatic.


There's two things to separate here.

One is the practical and societal consequences, iteratively, over the next few decades. Fine, this is important discussion. If this is what you're discussing, I have no worries - automation taking a significant portion of jobs, including software engineering, is a huge worry.

The other thing is this almost schadenfreude of intelligence. The argument goes something like, if AGI is a superset of all our intellectual, physical, and mental capabilities, what point is there of humans? Not from an economic perspective, but literally, a "why do humans exist" perspective? It would be "rational" to defer all of your thinking to a hyperintelligent AGI. Obviously.

The latter sentiment I see a decent bit on hackernews. You see it encoded in psychoanalytic comments like, "Humans have had the special privilege of being intelligent for so long, that they can't fathom that something else is more intelligent than them."

For me, the only actionable conclusion I can see from a philosophy like this is to Lie Down and Rot. You are not allowed to use your thinking, because a rational superagent has simply thought about it more objectively and harder than you.

I don't know. That kind of thinking, be it from intuitively when I was in my teens, to learning about government and ethics (Rational Utopianism, etc.) has always ticked me off. Incidentally, every single person who's thought that way unequivocally, I've disliked.

Of course, if you phrase it like this, you'll get called irrational and quickly get compared to not so nice things. I don't care. Compare me all you want to unsavory figures, this kind of psychoanalytic gaslighting statement is never conducive to "good human living".

Don't care if the rebuttal analogy is "well, you're a toddler throwing a tantrum, while the AGI simply moves on". You can't let ideologies like the second get to you.


It is actionable.

You may be doing the same thing from the outside, but the point is your approach to life is the issue. Your mindset itself filters experience.

This is the issue with "empirics", there's often very real intangibles that deal with fundamentally subjective things.

From the outside of course, you point to all the 'concrete' things, but it's really the intangibles that matter.


Very interested in this! I'm mainly a ChatGPT user; for me, o3 was the first sign of true "intelligence" (not 'sentience' or anything like that, just actual, genuine usefulness). Are these models at that level yet? Or are they o1? Still GPT4 level?


Not nearly o3 level. Much better than GPT4, though! For instance Qwen 3 30b-a3b 2507 Reasoning gets 46 vs GPT 4's 21 and o3's 60-something on Artificial Analysis's benchmark aggregation score. Small local models ~30b params and below tend to benchmark far better than they actually work, too.


the only actual humans in the loop here are the startup founders and engineers. pretty cut and dry case here

unless you want to blame the AI itself, from a legal perspective?


The etymology of the "markov property" is that the current state does not depend on history.

And in classes, the very first trick you learn to skirt around history is to add Boolean variables to your "memory state". Your systems now model, "did it rain The previous N days?" The issue obviously being that this is exponential if you're not careful. Maybe you can get clever by just making your state a "sliding window history", then it's linear in the number of days you remember. Maybe mix the both. Maybe add even more information .Tradeoffs, tradeoffs.

I don't think LLMs embody the markov property at all, even if you can make everything eventually follow the markov property by just "considering every single possible state". Of which there are (size of token set)^(length) states at minimum because of the KV cache.


The KV cache doesn't affect it because it's just an optimization. LLMs are stateless and don't take any other input than a fixed block of text. They don't have memory, which is the requirement for a Markov chain.


Have you ever actually worked with a basic markov problem?

The markov property states that your state is a transition of probabilities entirely from the previous state.

These states, inhabit a state space. The way you encode "memory" if you need it, e.g. say you need to remember if it rained the last 3 days, is by expanding said state space. In that case, you'd go from 1 state to 3 states, 2^3 states if you needed the precise binary information for each day. Being "clever", maybe you assume only the # of days it rained, in the past 3 days mattered, you can get a 'linear' amount of memory.

Sure, a LLM is a "markov chain" of state space size (# tokens)^(context length), at minimum. That's not a helpful abstraction and defeats the original purpose of the markov observation. The entire point of the markov observation is that you can represent a seemingly huge predictive model with just a couple of variables in a discrete state space, and ideally you're the clever programmer/researcher and can significantly collapse said space by being, well, clever.

Are you deliberately missing the point or what?


> Sure, a LLM is a "markov chain" of state space size (# tokens)^(context length), at minimum.

Okay, so we're agreed.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: