Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> but getting corrected after tests are run, whereas my agents typically get stuck in these write-and-test loops

This maybe a definition problem then. I don’t think “the agent did a dumb thing that it can’t reason out of” is a hallucination. To me a hallucination is a pretty specific failure mode, it invents something that doesn’t exist. Models still do that for me but the build test loop sets them aright on that nearly perfectly. So I guess the model is still hallucinating but the agent isn’t so the output is unimpacted. So I don’t care.

For the agent is dumb scenario, I aggressively delete and reprompt. This is something I’ve actually gotten much better at with time and experience, both so it doesn’t happen often and I can course correct quickly. I find it works nearly as well for teaching me about the problem domain as my own mistakes do but is much faster to get to.

But if I were going to be pithy. Aggressively deleting work output from an agent is part of their value proposition. They don’t get offended and they don’t need explanations why. Of course they don’t learn well either, that’s on you.



What I'm saying is that the model will get into one of these loops where it needs to be killed, and I'll look at some of the intermediate states and the reasons for failure and they are because it hallucinated things, ran tests, got an error. Does that make sense?

Deleting and re-prompting is fine. I do that too. But even one cycle of that often means the whole prompting exercise takes me longer than if I just wrote the code myself.


I think maybe this is another disconnect. A lot of the advantage I get does not come from the agent doing things faster than me, though for most tasks it certainly can.

A lot of the advantage is that it can make forward progress when I can’t. I can check to see if an agent is stuck, and sometimes reprompt it, in the downtime between meetings or after lunch before I start whatever deep thinking session I need to do. That’s pure time recovered for me. I wouldn’t have finished _any_ work with that time previously.

I don’t need to optimize my time around babysitting the agent. I can do that in the margins. Watching the agents is low context work. That adds the capability to generate working solutions during times that was previously barred from that.


I've done a few of these types of hands off and go to a meeting style interactions. It has worked a few times, but I tend to just find that they over do it or cause issues. Like you ask them to fix an error and they add a try catch, swallow the error, and call it a day. Or the PR has 1000 line changes when it should have two.

Either way, I'm happy that you are getting so much out of the tools. Perhaps I need to prompt harder, or the codebase I work on has just deviated too much from the stuff the LLMs like and simply isn't a good candidate. Either way, appreciate talking to you!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: