Hacker Newsnew | past | comments | ask | show | jobs | submit | esperent's commentslogin

The biggest issue for me has always been inherent US bias. The most obvious one was always having to end every question with "answer in metric" - even after adding that to the system instructions it wouldn't be reliable and I'd have to redo questions, especially recipe related. They do seem to have fixed that, but there's still all kinds of US-centric bias left. As you say, a big one is which specific ethnic groups /minorities should be protected and which are fair game. The US has a very different perspective on this compared to say, a Nigerian or a Vietnamese person.

Presumably companies will still provide manuals.

It'll be a single sheet of paper with a QR code that redirects to a canned prompt hosted at whichever LLM server paid the most to the manufacturer for their content.

If that was adequate then wouldn't there not be supplementary material?

Results vary of course. I have some very wonderful synthesizer manuals.


All they had to do was write a clear and simple message saying that one of their staff was responsible, has been fired, and they'll take steps to avoid this in future.

Their actions so far just make me think they're panicking and found a scapegoat to blame it on, but they're not going to put any new checks in place so it'll just happen again.


It was against their policy to use AI in producing any part of the final article, and the writer was aware of that.

I feel bad for the guy, but there's just no way I can imagine much better safeguards other than editors paying more close attention to referencing sources, and hiring more reliable people.


> It was against their policy to use AI in producing any part of the final article, and the writer was aware of that.

More than that, as a reporter on AI he should have been fully aware that AI frequently bullshits and lies. He should have known it was not reliable and that its output needs to be carefully verified by a human if you care at all about the accuracy or quality of what it gives you. His excuse that this was done in a fever induced state of madness feels weak when it was his whole job to know that AI was not an appropriate tool for the task.


>his whole job

Possibly akin to a roofer taking a shortcut up there, then taking a spill? You knew better but unfortunately let the fact that you could probably get away with it with zero impact decide for you.

IIRC hallucinations were essentially kicked off initially by user error, or rather… let’s say at least: a journalist using the best available technologies should have been able to reduce the chance of this big of an issue to near zero, even with language models in the loop & without human review.

(e.g. imagine Karpathy’s llm-council with extra harnessing/scripting, so even MORE expensive, but still. Or some RegEx!)


Alternatively… there was no AI error, the reporter made up the quotes, and lied when they were challenged.

The chance that the very first time AI was used it screwed up and was caught is pretty low.

It’s likely been used before but nobody got caught.


Are you familiar with the reporter's work & reputation?

Are we talking about the same guy who purportedly never even read an article with their name on it containing multiple insulting false quotes and was summarily dismissed?

Mind you, I’m not purporting malice but their choosing the blame AI rather than brain fog, bad notes, accidental transcription, or any other human error which would us look down upon them more.

Right now, AI errors like this are excusable. Soon they won’t be.


You have to give them time to do the job properly as well. Companies will often pay lip service to standards then squeeze their staff so much those standards are impossible to attain.

Yes, those are exactly the kind of steps they would need to publicly commit to in order to retain trust. And yet, instead we get silence, no acceptance that some measure of responsibility falls on the editorial team here. So it's clear they just hope it'll blow over without them having to do anything, which is the opposite of what a trustworthy site would do.

The latest Xiaomi has this, they call it continuous optical zoom. It's the first time I've seen it on a phone camera.

> median delay

Does that mean that half of responses have a negative delay? As in, humans interrupt each others sentences precisely half of the time?


Yes about 1/2 of human speech is interrupting others.

I assume 0 delay is the minimum here, and a median of 0 means over half of the data are exactly 0.

No, about 1/2 of human speech is interrupting others.

oh, interesting, I assumed the data came from interruptions (that seemed obvious) but I'm surprised you had some specific negative measurements. How do you decide the magnitude of the number? Just counting how long both parties are talking?

To be clear, it wasn't my research, I got it from studying some linguistics papers. But it was pretty straightforward. If I am talking, and then you interrupt, and 300ms later I stop talking, then the delay is -300ms.

Same the other way. If I stop taking and then 300ms later you start talking, then the delay is 300ms.

And if you start talking right when I stop, the delay is 0ms.

You can get the info by just listening to recorded conversations of two people and tagging them.


I assume there was a lot of variance? As in, some people interrupt others constantly and some do it rarely. Also probably a lot of adjustment depending on the situation, like depending on the relative status of the people, or when people are talking to a young child or non-native speaker.

All that to say, I'd imagine people are adaptable enough to easily handle 100ms+ delay when they know they're talking to an AI.


I think that's slightly the wrong way to look at this multi agent stuff.

I have between 3 and 6 hours per day where I can sit in front of a laptop and work directly with the code. The more of the actual technical planning/coding/testing/bug fixing loop I can get done in that time the better. If I can send out multiple agents to implement a plan I wrote yesterday, while another agent fixes lint and type errors, a third agent or two or three are working with me on either brainstorming or new plans, that's great! I'll go out for a walk in the park and think deeply during the rest of the day.

When people hear about all of these agent - working on three plans at once, really? - it sounds overwhelming. But realistically there's a lot of downtime from both sides. I ask the agent a question, it spends 5-10 minutes exploring. During that time I check on another agent or read some code that has been generated, or do some research of my own. Then I'll switch back to that terminal when I'm ready and ask a follow up question, mark the plan as ready, or whatever.

The worst thing I did when I was first getting excited about how agents were good now, a whole two months ago, was set things up so I could run a terminal on my phone and start sessions there. That really did destroy my deep thinking time, and lasted for about 3 days before I deleted termux.


A lot of phones, including latest flagships, don't support this, unfortunately.

I don't think it's hard to export, on the contrary its all already saved it your ~/.claude which so you could write up a tool to convert the data there to markdown.

> the resulting artifact would be so noisy and complicated that saving it as part of the project wouldn't really add that much value.

This is the major blocker for me. However, there might be value in saving a summary - basically the same as what you would get from taking meeting notes and then summarizing the important points.


I'm generally very sensitive to input latency and there's no way Ghostty has 41ms. I've only been using it for a couple of months though, so I guess it's fixed now.

Edit: just saw your second link from 4 months and yes, it's now avg 13ms which feels about right to me. Not perfect but acceptable. So what's even the point of sharing the old benchmark?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: