When I was in grad school my faculty advisor joked to me that to accurately estimate any medium to large software project, take your best estimate and multiply it by 3. If hardware is involved, multiply by 8.
Yes, he was telling me this tongue in cheek, but in my actual experience this has been eerily accurate.
When I first tried an LLM agent, I was hoping for an interactive, 2-way, pair collaboration. Instead, what I got was a pairing partner who wanted to do everything themselves. I couldn't even tweak the code they had written, because it would mess up their context.
I want a pairing partner where I can write a little, they write a little, I write a little, they write a little. You know, an actual collaboration.
Have you tried recently? This hasn't been my experience. I modify the code it's written, then ask it to reread the file. It generally responds "I see you changed file and [something.]" Or when it makes a change, I tell it I need to run some tests. I provide feedback, explain the problem, and it iterates. This is with Zed and Claude Sonnet.
I do notice though that if I edit what it wrote before accepting it, and then it sees it (either because I didn’t wait for it to finish or because I send it another message), it will overwrite my changes with what it had before my changes every single time, without fail.
My approach has generally been to accept, refactor and reprompt if I need to tweak things.
Of course this does artificially inflate the "accept rate" which the AI companies use to claim that it's writing good code, rather than being a "sigh, I'll fix this myself" moment.
I do this too and it drives me nuts. It's very obvious to me (and perhaps anyone without an incentive to maximize the accept rate) that the diff view really struggles. If you leave a large diff, copilot and cursor will both get confused and start duplicating chunks, or they'll fail to see the new (or the old) but if you accept it, it always works.
Aider solves this by turn-taking. Each modification is a commit. If you hate it, you can undo it (type /undo, it does the git reset --hard for you). If you can live with the code but want to start tweaking it, do so, then /commit (it makes the commit message for you by reading the diffs you made). Working I turns, by commits, Aider can see what you changed and keep up with you. I usually squash the commits at the end, because the wandering way of correcting the AI is not really useful history.
I've been thinking about this a lot recently - having AI automate product manager user research. My thread of thought goes something like this:
0. AI can scour the web for user comments/complaints about our product and automatically synthesize those into insights.
1. AI research can be integrated directly into our product, allowing the user to complain to it just-in-time, whereby the AI would ask for clarification, analyze the user needs, and autonomously create/update an idea ticket on behalf of the user.
2. An AI integrated into the product could actually change the product UI/UX on its own in some cases, perform ad-hoc user research, asking the user "would it be better if things were like this?" and also measuring objective usability metrics (e.g. task completion time), and then use that validated insight to automatically spawn a PR for an A/B experiment.
3. Wait a minute - if the AI can change the interface on its own - do we even need to have a single interface for everyone? Perhaps future software would only expose an API and a collection of customizable UI widgets (perhaps coupled with official example interfaces), which each user's "user agent AI" would then continuously adapt to that user's needs?
> 3. Wait a minute - if the AI can change the interface on its own - do we even need to have a single interface for everyone? Perhaps future software would only expose an API and a collection of customizable UI widgets (perhaps coupled with official example interfaces), which each user's "user agent AI" would then continuously adapt to that user's needs?
Nice, in theory. In practice it will be "Use our Premium Agent at 24.99$/month to get all the best features, or use the Basic Agent at 9.99$ that will be less effective, less customizable and inject ads".
Well, at the end of the day, capitalism is about competition, and I would hope for a future where that "User Agent AI" is a local model fully controlled by the user, and the competition is about which APIs you access through them - so maybe "24.99$/month to get all the best features", but (unless you relinquish control to MS or Google), users wouldn't be shown any ads unless they choose to receive them.
We're seeing something similar in VS Code and its zoo of forks - we're choosing which API/subscriptions to access (e.g. GitLens Pro, or Copilot, or Cursor/Windsurf/Trae etc.), but because the client itself is open source, there aren't any ads.
I try to be super careful, type the prompt I want to execute in a textfile. Ask the agent to validate and improve on it, and ask it to add an implementation plan. I even let another agent review the final plan.
But even then, occasionally it still starts implementing halfway a refining.
Same. I use /ask in Aider so I can read what it's planning, ask follow-up questions, get it to change things, then after a few iterations I can type "Make it so" while sitting back to sip on my Earl Grey.
I had done something slightly different. I would ask LLM to prepare a design doc, not code, and iterate on that doc before I ask them to start coding. That seems to have worked a little better as it’s less likely to go rogue.
In all honesty - have you tried doing what you would do with a paired programmer - that is, talk to them about it? Communicate? I’ve never had trouble getting cursor or copilot to chat with me about solutions first before making changes, and usually they’ll notice if I make my own changes and say “oh, I see you already added XYZ, I’ll go ahead and move on to the next part.”
I do this all the time with Claude Code. I’ll accept its changes, make adjustments, then tell it what I did and point to the files or tell it to look at the diff.
Pair programming requires communicating both ways. A human would also lose context if you silently changed their stuff.
Hmm you can tweak fine these days without messing up context. But, I run in “ask mode” only, with opus in claude code and o3 max in cursor. I specifically avoid agent mode because, like in the post, I feel like I gain less over time.
I infrequently tab complete. I type out 80-90% of what is suggested, with some modifications. It does help I can maintain 170 wpm indefinitely on the low-medium end.
Keeping up with the output isn’t much an issue at the moment given the limited typing speed of opus and o3 max. Having gained more familiarity with the workflow, the reading feels easier. Felt too fast at first for sure.
My hot take is that if GitHub copilot is your window into llms, you’re getting the motel experience.
> My hot take is that if GitHub copilot is your window into llms, you’re getting the motel experience.
I’ve long suspected this; I lean heavily on tab completion from copilot to speed up my coding. Unsurprisingly, it fails to read my mind a large portion of the time.
Thing is, mind reading tab completion is what I actually want in my tooling. It is easier for me to communicate via code rather than prose, and I find the experience of pausing and using natural language to be jarring and distracting.
Writing the code feels like a much more direct form of communicating my intent (in this case to the compiler/interpreter). Maybe I’m just weird; and to be honest I’m afraid to give up my “code first” communication style for programming.
Edit: I think the reason why I find the conversational approach so difficult is that I tend to think as I code. I have fairly strong ADHD and coding gives me appropriate amount of stimulation to do design work.
You are right, but it doesn't take many extroverts not understanding this concept to make it feel like it's everybody :)
My mother, for example, is a serious extrovert. When I explained to her that socializing seriously drains me and I need to, for example, spend time alone after attending a party, her response was to ask if I'd seen a therapist about it.
my own experience as someone who used to be very extroverted:
extroversion was meeting a social expectation. i had good social skills and people relied on me to carry social situations. i could entertain, organize and predict needs. i earned that expectation to feed my ego and then became trapped in a vicious cycle.
then i had a fresh start after moving to a new city for grad school and have done my best to avoid any vocal leadership for anything because i know what can happen. organizational, behind the scenes leadership is ok. i wonder how many extroverts would rather be introverts given the opportunity and some introspection
> Exactly how were you doing so? Were you able to predict these needs with "tells" or some other reference point? Did you get assessments wrong?
tells is a good way to put it. had a close friend from my hometown who lived for manipulating people and hanging out with him for 4 years taught me a lot. if he pulled some slick move or long setup on someone (including me) hed discuss the chain of tells and decisions if i asked him. boiled down mostly to confidence, conditioning, in group/out group. ugly stuff. the hook was his ability to manufacture novel, cheap thrills. this was enough to keep everyone interested in sticking around. he liked having cronies and i could do a b- version of him.
ive made many wrong assessments. i ignored the mistakes and focused on successes to keep feeding my ego. to abuse an analogy id burn a bridge without thinking of it because i was already making a new friend to fill that spot.
> What caused you to think it wasn't worth it anymore?
after leaving that environment i noticed how relaxing it was to hang out with my own thoughts. i realized how i was just playing part i had cast myself in for attention and no other real benefit. i happened to take an Excel VBA class my senior year and became obsessed with programming. became more interested in learning to code than anything else. i noticed the benefits of avoiding attention. introverts probably learn these lessons early but i learned them late.
Not for me. Coffee, anything caffeinated actually, makes me sick like I have the flu. The older I get, the more sensitive I become. I can't even eat chocolate anymore because of the caffeine content.
You don't. Explaining what happened is not going to help the user at all. You can only explain what options are available to the user. Something like the following
"Sorry, this message can no longer be saved. Copy this message before discarding it if you will need access to it later."
Sorry but hard disagree on this one. This increasingly popular assumption that users are clueless cavemen is very condescending. Help the user self diagnose instead.
WHY can’t it be saved?
Is the internet not connected? Check your WiFi.
Is the server full? Talk to admin.
Is the message deleted by another user? Talk to your team about ways of working?
Is it an internal application error? Tough luck, maybe the error code can be googled at least.
This does not mean you need to dump a stack trace in the users face, the examples above can still be presented briefly. If that’s too much effort to implement, consider an expandable details section.
The amount of applications lately where I had to open the verbose developer logs only to find silly user fixable errors is astonishing. Last one was simply credentials that had expired.
To anyone who thinks you are giving the users a magical experience, free from technicalities, by hiding root causes behind a facade of abstract, please think again. You are just frustrating the users even more, by making errors unpredictable.
> This increasingly popular assumption that users are clueless cavemen is very condescending.
Sorry, but no. And this idea that everyone else is condescending is offensive nonsense.
I understand where you're coming from, but the people who understand how actual users behave and what actual users want, in the real world, are not condescending - they're empathic. They recognize that the majority of people in the world are not like us, in terms of technical abilities, yes, but more importantly, in terms of desires.
The average user doesn't care. They don't have time to do things like "talk to admin", nor do they even know what "check your wifi" means. "Talk to your team about ways of working"? I'm sure most employees would just love to go and have awkward conversations with others, that's exactly what they want to do this minute.
This is even on the off-chance that the user has even read the message, which is incredibly unlikely. I can't count the number of times I've had developers tell me they had an error message and don't know what to do, and my solution was "let's read it, it says this is the error" and that being revelatory for them.
> This is even on the off-chance that the user has even read the message, which is incredibly unlikely. I can't count the number of times I've had developers tell me they had an error message and don't know what to do, and my solution was "let's read it, it says this is the error" and that being revelatory for them.
I agree that the problem with technical and non-technical users alike is one of motivation. Like you say, someone who is not technically minded won't care that the server is out of pace or their wifi is disconnected and developers who just want to get their code to compile don't care about learning how some new framework works.
But in all these cases, the users do care about doing whatever they were trying to do when they got the error. And the best way to help them is to give them all the relevant information they or someone else needs to fix it. Giving some generic error like "sorry your file could not be saved" neither helps those who are motivated to fix it, nor those who aren't motivated.
This reminds me of the time at my work when the CTO gathered all the devs together, and told us "We have to innovate more". With no real further instructions. He even showed us a graph (without any numbers) that had Profit on the Y axis and Innovation on the X axis, with line line going up to the right.
When asked when we were supposed to work on this innovation, he told us it was important to still work on our current projects, but do it more innovatively. When pressed about what this actually meant, he just showed us the graph again.
> but of course plenty of orgs figure out how to collaborate effectively while remote.
Some do, but most don't. Too many companies seem to think becoming remote means just installing Zoom on everybody's computers and sending them home. In reality, there is a lot more to it than that.
No remote technology comes close to having collaborators together in a room with a whiteboard, with a well-defined agenda. But while remote collaboration is less efficient, it doesn't mean it can't work. You just have to recognize that things are going to move slower and consensus of opinion will take more time.
Where’s the work on converging the two so (ideally) it doesn’t matter if you’re in the office or not?
I spend half my work time in a chem lab and half not in the lab. I would benefit if the two environments were not so disjoint.
> No remote technology comes close to having collaborators together in a room with a whiteboard, with a well-defined agenda.
Sad to say this is very true (with a minescule number of lucky exceptions IME). There is a lot of effort going into trying to narrow the gap but no breakthroughs yet that I have seen.
But there seems to be no work on the opposite.
For example these days it’s pretty routine to have automatic transcripts and recordings of zoom calls posted in the appropriate slack channels so you can catch up on parts (or all) you missed, refer to a discussion that might not have felt significant at the time, and so on. That stuff doesn’t exist in meatspace meetings.
The random asynchronous slack remark in the middle of the night can be transformative (most of course are useless). The same is true running into someone at the coffee machine.
We’re a startup but every morning we have a deliberately agendaless, unstructured call (late enough that kids are at school and you’ve had time to catch up on things if you want). Sometimes it’s 20 minutes and sometimes three hours. We ended up doing a major technology pivot as a result of this. But this doesn’t scale.
> There is almost no atmosphere for friction so basic Newtonian mechanics should be able to make a decent landing
Actually, I think that is the one of the biggest challenges. No atmosphere to slow you down means you have to rely entirely on rockets to slow down from orbital speeds to zero.
Yes, he was telling me this tongue in cheek, but in my actual experience this has been eerily accurate.
reply