Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> We are cheering for a product sold back to us at a 60% markup (input costs up to $2.00/M) that was built on our own private correspondence.

That feels like something between a hallucination and an intentional fallacy that popped up because you specifically said "intense discussion". The increase is 60% on input tokens from the old model, but it's not a markup, and especially not "sold back to us at X markup".

I've seen more and more of these kinds of hallucinations as these models seem to be RL'd to not be a sycophant, they're slowly inching into the opposite direction where they tell small fibs or embellish in a way that seems like it's meant to add more weight to their answers.

I wonder if it's a form of reward hacking, since it trades being maximally accurate for being confident, and that might result in better rewards than being accurate and precise



60% probably felt like a lot to Gemini. However, I liked the doomerism and how google was using our data to train its models.

Nonetheless, Gemini 3 failed this test. It failed to start a discussion. Its points were shallow, and too aiesque.


I'm not debating 60% being a lot, it's a factually incorrect statement: markup refers to increase over cost.

Looking at it again it's actually a completely nonsensical sentence that just happens to resemble a sensible statement in a way that would fool most people.

RL is definitely showing some busting seams at this point.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: