Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> My personal productivity has skyrocketed in the last 12 months.

Has your productivity objectively, measurably improved or does it just feel like it has improved? Recall the METR study which caught programmers self-reporting they were 20% faster with AI when they were actually 20% slower.



Objectively. I’m now tackling tasks I wouldn’t have even considered two or three years ago, but the biggest breakthrough has been overcoming procrastination. When AI handles over 50% of the work, there’s a 90% chance I’ll finish the entire task faster than it would normally take me just to get started on something new.


This. I had this long standing dispute that I just never had the energy to look up what needed to be done to resolve it. I just told it to ChatGPT and it generated everything -- including the emails I needed to send and who to send them to. Two weeks later and it was taken care of. I had sat on it for literally 3 months until then.

If I could have something that said, "Here are some things that it looks like you're procrastinating on -- do you want me to get started on them for you?" -- that would probably be crazy useful.


I have ADHD and it almost acts as a body double for me, which I find to be incredibly helpful to get things done.


GPT-4 got me seriously considering making a product for school-age kids w/ ADHD. It’d be a physical device (like a StarTrek communicator). That listens during your day and keeps track of a) things you say that you’ll do or b) tasks that other people ask you to do. Then it compiles those tasks and attempts to be basically a secretary. It can also plug into your email, texts & school assignments.

The privacy implications are horrifying. But if done right, you’re taking about a kind of digital ‘executive function’ that could help a lot of kids that struggle with things like prioritization and time blindness.


Marshall MacLuhan said something to the effect that every new communication technology results in a sort of self-amputation of that same faculty in the individual person.

I was diagnosed with ADHD and my interpretation of that diagnoses was not "I need something to take over this functionality for me," but "I need to develop this functionality so that I can function as a better version of myself or to fight against a system which is not oriented towards human dignity but some other end."

I guess I am reluctant to replace the unique faculties of individual children with a generic faculty approved by and concordant with the requirements of the larger society. How dismal to replace the unique aspects of children's minds with a cookie cutter prosthetic meant to integrate nicely into our bullshit hell world. Very dismal.


Sure, the implications are horrifying, but tech companies have proven themselves quite trustworthy over the past few decades, so I'm sure it'd be fine.

As someone with ADHD, I say: Please don't build this.


Look, the Torment Nexus has great potential, okay? The investors love it!


It could be built to use local models completely.

Open source transcription models are already good enough to do this, and with good context engineering, the base models might be good enough, too.

It wouldn't be trivial to implement, but I think it's possible already.


It’s not just for people with ADHD. Someone will build this very soon and people will use it a lot. Hopefully Apple builds it because I guess I trust them a little more.


We're not too far away from a smaller LLM that could be run locally that could do this, which would make it more privacy friendly. The plugging into my email seems like a great way to begin or complete a lethal trifecta and I don't have a good solution there, though.


Every iPhone, iPad, and Mac that either ships with or is upgraded to iOS 26, iPadOS 26 and macOS 26 has a 3-billion parameter LLM that’s available to developers and operates on-device. Mail, Notes, Reminders are already integrated.[1]

[1]: https://developer.apple.com/videos/play/wwdc2025/286


I might be out of the loop, but if anyone else is confused about the version number:

> If you were expecting iOS 19 after iOS 18, you might be a little surprised to see Apple jump to iOS 26, but the new number reflects the 2025-2026 release season for the software update.

https://www.macrumors.com/roundup/ios-26/


This is what I wanted to build the day chat gpt came out. Except being unable to guarantee the output due to hallucinations drove me into figuring out evals, and then the dream died due to complexity.


Same, I created a todo list with a simple MCP and it's been game changing, just being able to talk/discuss with my todo list somehow seems to keep me coming back to it rather than after 3 weeks it just becoming a sterile and abandoned list of random things


This is the first actual use case I’ve heard of that made sense for me. I’m going to try this.


Yes, it's also useful against writer's block. (Which might be a subset of ADHD, I don't know?)

For many people, it's easier to improve a bad first version of a piece of writing than to start from scratch. Even current mediocre LLM are great at writing bad first drafts.


> Even current mediocre LLM are great at writing bad first drafts.

Anyone is great at creating a bad first draft. You don’t need help to create something bad, that’s why that’s a common tip. Dan Harmon is constantly hammering on that advice for writer’s block: “prove you’re a bad writer”.

https://www.youtube.com/watch?v=BVqYUaUO1cQ

If you get an LLM to write a first draft for you, it’ll be full of ideas which aren’t yours which will condition your writing.


Little plot twist, you can pitch an LLM an idea for a scene, then tell it to interrogate you thoroughly for the details, then tell it to generate a clean, tight draft optimized for efficient use of language and readability, and you basically get your own ideas back but with a lot of the boring parts of writing already done.


Making your writing itself boring and the same as everyone’s who used that technique, transforming it into something no one will want to read anyway.


You pretty much jumped to the most negative possible interpretation. You think there isn't an editing process?


There is no editing process during a first draft, no. That’s the whole point of a draft, it’s a separate process from revisions and editing.

https://en.wikipedia.org/wiki/Drafting_(writing)


Were you trying to be intentionally obtuse? What was your goal with that reply? Are you trying to troll?


`Anyone is great at creating a bad first draft`

Famously not so! Writer's block is real!


The rest of the paragraph and the link address exactly that.


There might be techniques that help overcome writer's block, sure. But it's still a real problem.

Getting an LLM to produce a not-so-bad first draft is just another technique.


Exactly. Agentic LLMs are amazing for people who suffer from chronic akrasia.


OH wow, a word to describe me


The latin word for that is "incontinentia".


Those are exactly the people who shouldn’t use these tools. They’ll be sucked in to whatever bullshit the LLM peddles them and become unable to leave.


As long as it's helping them be productive, I don't really see an issue with that. Going from doing nothing to doing something is a net boost.


But it’s not helping them “be productive” (which is a horrible metric anyway, life is not about productivity above everything else), it’s sucking their lives and severing their relationships and connection to the real world.

https://archive.ph/20250924025805/https://www.nytimes.com/20...

https://archive.is/26aHF


Those are chatbots and those issues are not akrasia. We're talking about very different things in multiple dimensions.


I don't think it's helped me do anything I couldn't do, in fact I've learned it's far easier to do hard things myself than trying to prompt an AI out of the ditches it will dig trying to do it. But I also find it's great for getting painful and annoying tasks out of the way that I really can't motivate to do myself.


> I don't think it's helped me do anything I couldn't do

I am seeing a pattern here. It appears that AI isn't for everyone. Not everyone's personality may be a good fit for using AI. Just like not everybody is a good candidate for being a software dev, or police officer etc.

I used to think that it is a tool. Like a car is. Everybody would want one. But that appears not be the case.

For me, I used AI every day as a tool, for work and and home tasks. It is a massive help for me.


What home tasks do you use it for?

It's hard for me to imagine many. It's not doing the dishes or watering the plants.

If I wanted to rearrange the room I could have it mock up some images, I guess...


Figuring out which fertilizer, how often to water and sun placement for the plants is a useful AI request.


Is it? It'll take a while for fertilizer and sun placement to take visually effect, and there's risk that short term effects aren't indicative of long term effects.

How can you verify the recommendations are sound, valid, safe, complete, etc., without trying them out? And trying out unsound, invalid, unsafe, incomplete, etc., recommendations might result in dead plants in a couple of weeks.


I personally use chatgpt for initial discovery on these sorts of problems, maybe ask a probing question or two and then go back to traditional search engines to get a very rough second opinion(which might also lead to another round of questions). By the end of that process I'll either have seen that the llm is not helpful for that particular problem, or have an answer that I'm "reasonably confident" is "good enough" to use for something medium to low risk like potentially killing a plant. And I got there within 10-20 minutes, half of that being me just reading the 'bots summary.


> How can you verify the recommendations are sound, valid, safe, complete, etc., without trying them out?

Such an odd complaint about LLMs. Did people just blindly trust Google searches before hand?

If it's something important, you verify it the same way you did anything else. Check the sources and use more than a single query. I have found the various LLMs to very useful in these cases, especially when I'm coming at something brand new and have no idea what to even search for.


Eh, for something like this the cost of it being wrong might be pretty small, but I'd bet odds are good that its recommendations will be better than whatever I might randomly come up with without doing any research. And I don't have the time to do the research on normal old google where it's really hard to find exactly what I want.

I've found it immensely helpful for giving real world recommendations about things like this, that I know how to find on my own but don't have the time to do all the reading and synthesizing.


That's an interesting perspective, I don't think it's an innate thing though, I think it's a mindset issue. Humans are adaptable, but we're even more stubborn.


It’s weird how divisive it is. For me it’s completely dependent on the quality of the output. Lately, it’s been more of a hinderance.


I think there might be cases, for some people or some tasks, where the difficulty of filling in a blank page is greater than the difficulty of fixing an entire page of errors. Even if you have to do all the same mental work, it feels like a different category of work.


A very good tip: you get one chance to prompt them to a new path failing that clear the context and start again from the current premise.

Use only actionable prompts, negations don't work on ai and they don't work on people.


Same. It takes the drudgery out of creating, so I can at least start the projects. Then I can go down into the detail just enough that the AI doesn't produce crap, but without needing to write the actual writes of code myself.

Hell, in the past few days I started making something to help me write documents for work (https://www.writelucid.cc) and a viewer for all my blood tests (https://github.com/skorokithakis/bt-viewer), and I don't think I would have made either without an LLM.


Same here. I’ve single-shot created a few Raycast plugins for TLV decoding that save me several seconds to a few minutes per task which I use almost daily at work.

Would have never done that without LLMs.


there's some minority that's in the venn diagram of being good at programming, being good at AI, then also being good at using AI for programming (which is mostly project management), and if everything aligns then there are superhuman productivity gains.

I'm tackling projects solo I never would have even attempted before but I could see people getting bad results and giving up.


> if everything aligns then there are superhuman productivity gains.

This is a truism and, I believe, is at the core of the disagreement on how useful AI tools are. Some people keep talking about outlier success. Other people are unimpressed with the performance in ordinary tasks, which seem to take longer because of back-and-forth prompting.


Same here! Started learning self hosted k3s, with terraform and IaC and all the bells and whistles. I would never have had the energy to look up how to even get started. In three hours I have a cluster.


Doesn't sound like you learned it, sounds like it did it for you, using you as the tool.

IOW, can you redo it by yourself? If you can't then you did not learn it.


Is that really a fair comparison? I think the amount of people who can memorize each and every configuration item is vanishingly small... even when I was bootstrapping k8s clusters before the dawn of LLMs I had to lookup current documentation and maybe some up to date tutorials.

Knowing the abstract steps and tripwires yes, but details will always have to be looked up. If just not to miss any new developments.


> Is that really a fair comparison?

Well, yes it is; you can't very well claim to have learned something if you are unable to do it.


It doesn't matter - GP is now able to do things they were unable to do before. A distinction without a (real-world) difference.


> It doesn't matter - GP is now able to do things they were unable to do before. A distinction without a (real-world) difference.

I get that point, but the original post I replied to didn't say "Hey, I know have $THING set up when I never had it before", he said "I learned to do $THING", which is a whole different assertion.

I'm not contending the assertion that he now has a thing he did not have before, I'm contending the assertion that he has learned something.


My programming productivity has improved a lot with Claude Code.

One thing I've noticed is that I don't have a circle of people where I can discus programming with, and having an LLM to answer questions and wireframe up code has been amazing.

My job doesn't require programming, but programming makes my job much easier, and the benefits have been great.


Incredible how many people here just don’t believe you because it doesn’t reflect their personal experience.

I want to second your experience as I’ve had the same as well. Tackling SO many more tasks than before and at such a crazy pace. I’ve started entire businesses I wouldn’t have just because of AI.

But at the same time, some people have weird blockers and just can’t use AI. I don’t know what it is about it - maybe it’s a mental block? Wrong frame of mind? It’s those same people who end up saying “I spend more time fighting the ai and refining prompts than I would on the end task”.

I’m very curious what it is that actually causes this divide.


Now is a good time to use it to make money, before it gets to the point where everyone is using it.

I've been using it for almost a year now, and it's definitely improved my productivity. I've reduced work that normally takes a few hours to 20 minutes. Where I work, my manager was going to hire a junior developer and ended up getting a pro subscription to Claude instead.

I also think it will be a concern for that 50-something developer that gets laid off in the coming years, has no experience with AI, and then can't find a job because it's a requirement.

My cousin was a 53 year old developer and got laid off two years ago. He looked for a job for 6 months and then ended up becoming an auto mechanic at half the salary, when his unemployment ran out.

The problem is that he was the subject matter expert on old technology and virtually nobody uses it anymore.


> I’m now tackling tasks I wouldn’t have even considered two or three years ago

Ok, so subjective


any objective measure of "productivity" (when it comes to knowledge work) is, when you dig down into it enough, ultimately subjective.


"Not done" vs "Done" is as objective as it gets.


You obviously have never worked a company that spends time arguing about the "definition of done". It's one of the most subjective topics I know about.


Sounds like a company is not adequately defining what the deliverables are.

Task: Walk to the shops & buy some milk.

Deliverables: 1. Video of walking to the shops (including capturing the newspaper for that day at the local shop) 2. Reciept from local store for milk. 3. Physical bottle of Milk.


Cool, I went to the store and bought a 50ml bottle of probiotic coconut milk. Task done?


Yes.

milk (noun):

1. A whitish liquid containing proteins, fats, lactose, and various vitamins and minerals that is produced by the mammary glands of all mature female mammals after they have given birth and serves as nourishment for their young.

2. The milk of cows, goats, or other animals, used as food by humans.

3. Any of various potable liquids resembling milk, such as coconut milk or soymilk.


In germany soymilk and the like can't be sold as milk. But coconut milk is okay. (I don't know if that's a german thing or a EU-thing.)


The last 3-4 comments in this sub-thread may well be peak HN


Only if you can tick off ALL of the deliverables that verify "done".


Sure, I took a video etc like in the deliverables. That means it’s successfully done?


Yes, it's done.

You get what you asked for, or you didn't sufficiently define it.


And when on the receiving end of the deliverables list, it's always a good idea to make sure they are actually deliverable.

There's nothing worse than a task where you can deliver one item and then have to rely on someone else to be able to deliver a second. Was once in a role where performance was judged on closing tasks; getting the burn-down chart to 0, and also having it nicely stepped. Was given a good tip to make sure each task had one deliverable and where possible—be completed independent of any other task.


Yes.

Why would you write down "Buy Milk", then go buy whatever thing you call milk, then come back home and be confused about it?

Only an imbecile would get stuck in such a thing.


Well, I think in this example someone else wrote down “buy milk”. Of course I would generally know what that’s likely to mean, and not buy the ridiculous thing. But someone from a culture that’s not used to using milk could easily get confused and buy the wrong thing, to further the example. I guess my point was that it’s never possible to completely unambiguously define when a task is done without assuming some amount of shared knowledge with the person completing the task that lets them figure out what you meant and fill in any gaps


It removes ambiguity. Everyone knows when work is truly considered done, avoiding rework, surprises, and finger-pointing down the line.


At work we call this scope creep.


> I’m now tackling tasks I wouldn’t have even considered two or three years ago

Could you give some examples, and an indication of your level of experience in the domains?

The statement has a much different meaning if you were a junior developer 2 years ago versus a staff engineer.


I have been coding on and off (more off than on) for 47 years. I kinda stopped paying attention when we got past jquery and was never a fan of prototypical inheritance. Never built anything with tailwind, Next.js, etc. After spending some time writing copy, user stories and a design brief (all iterative with ChatGPT) cursor one shot my (simple) web app and I was live (once I'd spent a couple hours documenting my requirements and writing my copy) in 20 minutes of vibe coding.

I've been adding small features in a language I don't program in using libraries I'm not familiar with thhat meet my modest functional requirements in a couple minutes each. I work with an LLM to refine my prompt, put it into cursor, run the app locally, look at the diffs, commit, push and I'm live on vercel within a minute or two.

I don't have any good metrics for productivity, so I'm 100% subjective but I can say that even if I'd been building in Rails (it's been ~4 years but I coded in it for a decade) it would have taken me at least 8 hours to have an app where I was happy with both the functionality and the look and feel so a 10x improvement in productivity for that task feels about right.

And having a "buddy" I can discuss a project with makes activation energy lower allowing me to complete more.

Also, YC videos I don't have the time to watch, I get a transcript, feed into chatGTP, ask for the key take aways I could apply to my business (it's in a project where it has context on stage, industry, maturity, business goals, key challenges, etc) so I get the benefits of 90 minutes of listening plus maybe 15 minutes of summarizing, reviewing and synthesis in typically 5-6 minutes - and it'd be quicker if I built a pipeline (something I'm vibe coding next month)

Wouldn't want to do business without it.


How do you deal with security for web stuff? I wouldn't host anything vibe-coded publicly because I'm not enough of an expert in web/frontend to even double-check that it's not generating some giant holes.


The same way you do security for manually written code. Rigorously. But in this case, you can also have AI also do your code reviews and suggest/write unit tests. Or write out a spec and refine it. Or point it to OWASP and say, look at this codebase and make a plan to check for these OWASP top 10.

And have another AI review your unit tests and code. It's pretty amazing how much nuance they pick up. And just rinse and repeat until the AI can't find anything anymore (or you notice it going in circles with suggestions)


Yeah, some of these comments make it sound we had zero security issues pre-AI. I think the challenge is what you touched on, you have to tell the AI to handle it just like anything else you want as a requirement. I've use AI to 'vibe' code things and they have turned out pretty well. But, I absolutely leaned on my 20+ years of experience to 'work' with the AI to get what I wanted.


If you never put your personal side-project on the public web you had very few security issues resulting from your personal projects. We weren't talking about companies in this thread.

Are the frontend folks having such great results from LLMs that they're OK with "just let the LLM check for security too" for non-frontend-engineer created projects that get hosted publicly?


”I’m now tackling tasks I wouldn’t have even considered two or three years ago”

This. 100x this.


What tasks is it doing 50% of the work on for you?


Not who you asked, but I upgraded NextJS in a couple of repos by just telling Claude Code to do it. I've had it swap out and upgrade libraries successfully in one shot too. It will usually create good enough Page Objects for E2Es and scaffold out the test file, which speeds up the development process a bit. Same for upgrading Node versions in some Lambda projects, just tell it to go and come back later. Instruct it to run the test and build steps and it's also like having a mini CI system running too.

Personally, I think it really shines at doing the boring maintenance and tech debt work. None of these are hard or complex tasks but they all take up time and for a buck or two in tokens I can have it doing simple but tedious things while I'm working on something else.


> Personally, I think it really shines at doing the boring maintenance and tech debt work.

It shines at doing the boring maintenance and tech debt work for web. My experiences with it, as a firmware dev, have been the diametric opposite of yours. The only model I've had any luck with as an agent is Sonnet 4 in reasoning mode. At an absolutely glacial pace, it will sometimes write some almost-correct unit tests. This is only valuable because I can have it to do that while I'm in a meeting or reading emails. The only reason I use it at all is because it's coming out of my company's pocket, not mine.


For sure. There's tons of training data in the models for the JS and TS language and the specific tasks I outlined, but not specifically just the web, I have several Node or Bun + Typescript + SQLite CLI utilities that it also helps with. I definitely pick my battles and lean in to what it works best for though. Anything it appears to struggle at I'll just do manually and develop it like we always did. It's rarely not a net positive to me but it's very frequently a negligible improvement. Anything that doesn't pay off in spades I typically don't try again until new models release or new tools or approaches are available.


Definitely agree that the stack matters.

If you're doing JS/Python/Ruby/Java, it's probably the best at that. But even with our stack (elixir), it's not as good as, say, React/NextJS, but it's definitely good enough to implement tons of stuff for us.

And with a handful of good CLAUDE.md or rules files that guide it in the right direction, it's almost as good as React/NextJS for us.


I can see how these things are convenient, if it succeeds. I struggle because my personal workflow is to always keep two copies of a repo up at once. One is deep thought vs drone work. I have always just done these kinds of background tasks whenever I am in meetings, compiling etc. I haver not seen much productivity boost due to this. oddly, you would think being able to further offload during that time would help, but reviewing the agent output ends up being far more costly (and makes the context switch significantly harder, for some reason). It's just not proving to be useful consistently, for me.


Just off the top of my head (and I exclusively use Claude Code now):

Random Postgres stuff:

- Showed a couple of Geo/PostGIS queries that were taking up more CPU according to our metrics, asked it to make it faster, it rewrote it in away that it actually used the index. (using the <-> operator for example for proximity). One-shotted. Whole effort was about 5 mins.

- Regularly asking for maintenance scripts (like give me a script that shows me the most fragmented tables, or highest storage, etc).

CSS:

Built a whole horizontal logo marquee with CSS animations, I didn't write a single line, then I asked for little things like "have the people's avatars gently pulsate" – all this was done in about 15 mins. I would've normally spent 8-16 hours on all that pixel pushing.

Elixir App:

- I asked it to look at my GitHub actions file and make it go faster. In about 2-3 iterations, it cut my build time from 6 minutes to 2 minutes. The effort was about an hour (most of it spent waiting for builds, or fiddling with some weird syntax errors or just combining a couple extra steps, but I didn't have to spend a second doing all the research, its suggestions were spot on)

- In our repo (900 files) we had created an umbrella app (a certain kind of elixir app). I wanted to make it a non-umbrella. This one did require more work and me pushing it, but I've been putting off this task for 3 YEARS since it just didn't feel like a priority to spend 2-3 days on. I got it done in about 2 hours.

- Built a whole discussion board in about 6 hours.

- There are probably 3-6 tickets per week where I just say "implement FND-234", and it one-shots a bugfix, or implementation, especially if it's a well defined smaller ticket. For example, make this list sortable. (it knows to reuse my sortablejs hook and look at how we implemented it elsewhere).

- With the Appsignal MCP, I've had it summarize the top 5 errors in production, and write a bug fix for one I picked (I only did this once, the MCP is new). That one was one-shotted.

- Rust library (It's just an elixir binding to a rust library, the actual rust is like 20 lines, so not at all complex)... I've never coded a day of rust in my life, but all my cargo updates and occasional syntax/API deprecations, I have claude do my upgrades and fixes. I still don't know how to write any Rust.

NextJS App:

- I haven't fixed a single typescript error in probably 5 months now, I can't be bothered, CC gets it right about 99% of the time.

- Pasted in a Figma file and asked it to implement. This rarely is one-shotted. But it's still about 10x faster than me developing it manually.

The best combination is if you have a robust component library and well documented patterns. Then stuff goes even faster.

All on the $100 plan in which I've hit the limit only twice in two months. I think if they raised the price to $500, it would still feel like a no-brainer.

I think Anthropic knows this. My guess is that they're going to get us hooked on the productivity gains, and we will happily pay 5x more if they raised the prices, since the gains are that big.


Not this again. That study had serious problems.

But I’m not even going to argue about that. I want to raise something no one else seems to mention about AI in coding work. I do a lot of work now with AI that I used to code by hand, and if you told me I was 20% slower on average, I would say “that’s totally fine it’s still worth it” because the EFFORT level from my end feels so much less.

It’s like, a robot vacuum might take way longer to clean the house than if I did it by hand sure. But I don’t regret the purchase, because I have to do so much less _work_.

Coding work that I used to procrastinate about because it was tedious or painful I just breeze through now. I’m so much less burnt out week to week.

I couldn’t care less if I’m slower at a specific task, my LIFE is way better now I have AI to assist me with my coding work, and that’s super valuable no matter what the study says.

(Though I will say, I believe I have extremely good evidence that in my case I’m also more productive, averages are averages and I suspect many people are bad at using AI, but that’s an argument for another time).


> Not this again. That study had serious problems.

The problem is, there are very few if any other studies.

All the hype around LLMs we are supposed to just believe. Any criticism is "this study has serious problems".

> It’s like, a robot vacuum might take way longer

> Coding work that I used to procrastinate

Note how your answer to "the study had serious problems" is totally problem-free analogies and personal anecdotes.


> The problem is, there are very few if any other studies.

Not at all, the METR study just got a ton of attention. There are tons out there at much larger scales, almost all of them showing significant productivity boosts for various measures of "productivity".

If you stick to the standard of "Randomly controlled trials on real-world tasks" here are a few:

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4945566 (4867 developers across 3 large companies including Microsoft, measuring closed PRs)

https://www.bis.org/publ/work1208.pdf (1219 programmers at a Chinese BigTech, measuring LoC)

https://www.youtube.com/watch?v=tbDDYKRFjhk (from Stanford, not an RCT, but the largest scale with actual commits from 100K developers across 600+ companies, and tries to account for reworking AI output. Same guys behind the "ghost engineers" story.)

If you look beyond real-world tasks and consider things like standardized tasks, there are a few more:

https://ieeexplore.ieee.org/abstract/document/11121676 (96 Google engineers, but same "enterprise grade" task rather than different tasks.)

https://aaltodoc.aalto.fi/server/api/core/bitstreams/dfab4e9... (25 professional developers across 7 tasks at a Finnish technology consultancy.)

They all find productivity boosts in the 15 - 30% range -- with a ton of nuance, of course. If you look beyond these at things like open source commits, code reviews, developer surveys etc. you'll find even more evidence of positive impacts from AI.


Thank you!

> https://www.youtube.com/watch?v=tbDDYKRFjhk (from Stanford, not an RCT, but the largest scale with actual commits from 100K developers across 600+ companies, and tries to account for reworking AI output. Same guys behind the "ghost engineers" story.)

I like this one a lot, though I just skimmed through it. At 11:58 they talk about what many find correlates with their personal experience. It talks about easy vs complex in greenfield vs brownfield.

> They all find productivity boosts in the 15 - 30% range -- with a ton of nuance, of course.

Or 5-30% with "Ai is likely to reduce productivity in high complexity tasks" ;) But yeah, a ton nuance is needed


Yeah that's why I like that one too, they address a number of points that come up in AI-related discussions. E.g. they even find negative productivity (-5%) in legacy / non-popular languages, which aligns with what a lot of folks here report.

However even these levels are surprising to me. One of my common refrains is that harnessing AI effectively has a deceptively steep learning curve, and often individuals need to figure out for themselves what works best for them and their current project. Took me many months, personally.

Yet many of these studies show immediate boosts in productivity, hinting that even novice AI users are seeing significant improvements. Many of the engineers involved didn't even get additional training, so it's likely a lot of them simply used the autocompletion features and never even touched the powerful chat-based features. Furthermore, current workflows, codebases and tools are not suited for this new modality.

As things are figured out and adopted, I expect we'll see even more gains.


Closed PRs, commits, loc etc are useless vanity metrics.

With ai code you have more loc and NEED more PRs to fix all its slop.

In the end you have increased numbers with net negative effect


Most of those studies call this out and try to control for it (edit: "it" here being the usual limitations of LoC and PRs as measures of productivity) where possible. But to your point, no, there is still a strong net positive effect:

> https://www.youtube.com/watch?v=tbDDYKRFjhk (from Stanford, not an RCT, but the largest scale with actual commits from 100K developers across 600+ companies, and tries to account for reworking AI output. Same guys behind the "ghost engineers" story.)

Emphasis added. They modeled a way to detect when AI output is being reworked, and still find a 15-20% increase in throughput. Specific timestamp: https://youtu.be/tbDDYKRFjhk?t=590&si=63qBzP6jc7OLtGyk


Could you try to avoid uncertainties like this by measuring something like revenue growth before and after AI? Given enough data.


Hmm, not an economist but I have seen other studies that look at things at the firm level, so definitely should be possible. A quick search on Google and SSRN didn't turn up some studies but they seem to focus on productivity rather than revenues, not sure why. Maybe because such studies depend on the available data, however, so a lot of key information may be hidden, e.g. revenues of privately held companies which constitute a large part of the economy.


True it probably would be difficult to gather representative data. Also might be hard to seperate out broader economic effects e.g. overall upturns.


No, you misunderstood me. Those other points aren’t related to any criticism of the study. Those are points that backup my other point.

I did say I wasn’t going to argue the point that study made, and I didn’t.


Often someone’s personal productivity with AI means someone else have to dig through their piles of rubbish to review PR they committed.

In your particular case it sounds like you’re rapidly loosing your developer skills, and enjoy that now you have to put less effort and think less.


We know that relying heavily on Google Maps makes you less able to navigate without Google Maps. I don't think there's research on this yet, but I would be stunned if the same process isn't at play here.


Whatever your mind believes it doesn’t need to hold on to that what is expensive to maintain and run, it’ll let go of. This isn’t entirely accurate from a neuroscience perspective but it’s kinda ballpark.

Pretty much like muscles decay when we stop using them.


Sure, but sticking with that analogy, bicycles haven’t caused the muscles of people that used to go for walks and runs to atrophy either – they now just go much longer distances in the same time, with less joint damage and more change in scenery :)


>> Whatever your mind believes it doesn’t need to hold on to that what is expensive to maintain and run, it’ll let go of. This isn’t entirely accurate from a neuroscience perspective but it’s kinda ballpark.

>> Pretty much like muscles decay when we stop using them.

> Sure, but sticking with that analogy, bicycles haven’t caused the muscles of people that used to go for walks and runs to atrophy either ...

This is an invalid continuation of the analogy, as bicycling involves the same muscles used for walking. A better analogy to describe the effect of no longer using learned skills could be:

  Asking Amazon's Alexa to play videos of people
  bicycling the Tour de France[0] and then walking
  from the couch to the your car every workday
  does not equate to being able to participate in
  the Tour de France[0], even if years ago you
  once did.
0 - https://www.letour.fr/en/


Thanks for putting the citation for the Tour de France. I wouldn't have believed you otherwise.


> Thanks for putting the citation for the Tour de France. I wouldn't have believed you otherwise.

Then the citation served its purpose.

You're welcome.


Oh, but they do atrophy, and in devious ways. Though the muscles under linear load may stay healthy, the ability of the body to handle the knee, ankle, and hip joints under dynamic and twisting motion does atrophy. Worse yet, one may think that they are healthy and strong, due to years of biking, and unintentionally injure themselves when doing more dynamic sports.

Take my personal experience for whatever it is worth, but my knees do not lie.


Sure, only cycling sounds bad, as does only jogging. And thousands of people hike the AT or the Way of St. James every year, despite the existence of bicycles and even cars. You've got to mix it up!

I believe the same holds true for cognitive tasks. If you enjoy going through weird build file errors, or it feels like it helps you understand the build system better, by all means, go ahead!

I just don't like the idea of somehow branding it as a moral failing to outsource these things to an LLM.


Yeah, but what's going to happen with LLMs is that the majority will just outsource thinking to the LLM. If something has a high visible reward with hidden, dangerous risks, people will just go for the reward.


Ok Socrates, let’s go back to memorizing epic poems.


To extend the analogy further, people who replace all their walking and other impact exercises with cycling tend to end up with low bone density and then have a much higher risk of broken legs when they get older.


Well, you still walk in most indoor places, even if you are on the bike as much as humanly possible.

But if you were to be literally chained to a bike, and could not move in any other way than surely you would "forget"/atrophy in specific ways that you wouldn't be able to walk without relearning/practicing.


> Whatever your mind believes it doesn’t need to hold on to that what is expensive to maintain and run, it’ll let go of. This isn’t entirely accurate from a neuroscience perspective but it’s kinda ballpark.

A similar phenomena occurs when people see or hear information and whether they record it in writing or not. The act of writing the percepts, in and of itself, assists in short-term to long-term memory transference.


I know that I am better at navigating with google maps than average people, because I navigated for years without it (partly on purpose). I know when not to trust it. I know when to ignore recommendations on recalculated routes.

Same with LLMs. I am better with it, because I know how to solve things without the help of it. I understand the problem space and the limitations. Also I understand how hype works and why they think they need it (investors money).

In other words, no, just using google maps or ChatGPT does not make me dumb. Only using it and blindly trusting it would.


Yeah this definitely matches my experience and guess what? Google maps sucks for public transit and isn't actually that good for pedestrian directions (often pointing people to "technically" accessible paths like sketchy sidewalks on busy arterial roads signed for 35mph where people go 50mph). I stopped using Google maps instinctually and now only use it for public transit or drives outside of my city. Doing so has made me a more attentive driver, less lazy, less stressed when unexpected issues on the road occur, restored my navigation skills, and made me a little less of, frankly, an adult man child.

Applying all of this to LLMs has felt similar.


Gets worse for projects outsourced to 1+ Consultancy firms, where staff costs are prohibitively high, now you've got another layer of complexity to factor in (risks, costs).

Consultancy A submit work, Consultancy B reviews/tests. As A increases the use of AI, B will have to match with more staff or more AI. More staff for B, mean higher costs, at slower pace. More AI for B, means higher burden of proof, an A vs B race condition is likely.

Ultimately clients will suffer from AI fatigue and inadvertently incur more costs at later stage (post-delivery).


My own code quality is better with AI, because it makes it feasible to indulge my perfectionism to a much greater degree. Before AI, I usually needed to stop sooner than I would have liked to and call it good enough. Now I can justify making everything much more robust because it doesn’t take a lot longer.

It’s the same story with UI/UX. Previously, I’d often have to skip little UI niceties because they take time and aren’t that important. Now even relatively minor user flows can be very well polished because there isn’t much cost to doing so.


https://github.com/plandex-ai/plandex/blob/9017ba33a627c518a...

Well your perfectionism needs to be pointed towards this line. If you get truly large numbers of users this will either slow down token checking directly or your process for removing ancient expired tokens (I'm assuming there is such a process...) much slower and more problematic.


Lol is that really the best example you could find?


Truly the response of someone who is a perfectionist using llms the right way and not a slop coder


It's just funny because there are definitely examples of bad code in that repo (as there are in any real project), but you picked something totally routine. And your critique is wrong fwiw—it would easily scale to millions of users. Perhaps you could find something better if you used AI to help you...


I’d love not to have to be great at programming, as much as I enjoy not being great at cleaning the canalization. But I get what you mean, we do lose some potentially valuable skills if we outsource them too often for too long.


It’s probably roughly as problematic as most people not being able to fix even simple problems with their cars themselves these days (i.e., not very).


Everyone needs to have AI to do some minor modification in Excel file?


Of course not. Who is arguing for that?


Give it time. They will, eventually.


This is so baseless and insulting and makes so many assumptions I don’t think you deserve a response from me at all.


> In your particular case it sounds like you’re rapidly loosing your developer skills, and enjoy that now you have to put less effort and think less.

Just the other day I was complaining that no one knows how to use a slide rule anymore...

Also C++ is producing bytecode that's hot garbage. It's like no one understands assembly anymore...

Even simple tools are often misused (like hammering a screw). Sometimes they are extremely useful in right hands though. I think we'll discover that the actual writing of code isn't as meaningful as thinking about code.


Hahaha well said, thank you. I feel like I’m taking crazy pills reading some of the comments around here. Serious old man shakes fist at cloud moments.


I'm losing my developer skills like I lost my writing skills when I got a keyboard. Yes, I can no longer write with a pen, but that doesn't mean I can't write.


Also I don’t know about you but despite the fact that I basically never write with a pen, the occasional time I have to I’m a little slow sure but it’s not like I physically can’t do it. It’s no big deal.

Imagine telling someone with a typewriter that they’d be unable to write if they don’t write by hand all the time lol. I write by hand maybe a few times a year - usually writing a birthday card or something - but I haven’t forgotten.


Yep, same. I might have forgotten some function names off the top of my head, but I still know how to program, and I do every day.


Exactly


Another way of viewing it would be that LLMs allow software developers to focus their development skills where it actually matters (correctness, architecture etc.), rather than wasting hours catering to the framework or library of the day’s configuration idiosyncrasies.

That stuff kills my motivation to solve actual problems like nothing else. Being able to send off an agent to e.g. fix some build script bug so that I can get to the actual problem is amazing even with only a 50% success rate.


The path forward here is to have better frameworks and libraries, not to rely on a random token generator.


Sure, will you write them for me?

Otherwise, I’ll continue using what works for me now.


>better frameworks and libraries

I feel like the past few decades of framework churn has shown that we're really never going to agree on what this means


You still have to review and understand changes that your “AI agent” did. If you don’t review and fully understand everything it does, then I fear for your project.


> But I’m not even going to argue about that. I want to raise something no one else seems to mention about AI in coding work. I do a lot of work now with AI that I used to code by hand, and if you told me I was 20% slower on average, I would say “that’s totally fine it’s still worth it” because the EFFORT level from my end feels so much less.

I completely get this and I often have an LLM do boring stupid crap that I just don't wanna do. I frequently find myself thinking "wow I could've done it by hand faster." But I would've burned some energy that could be better put towards other stuff.

I don't know if that's a net positive, though.

On one hand, my being lazy may be less of a hindrance compared to someone willing to grind more boring crap for longer.

On the other hand, will it lessen my edge in more complicated or intricate stuff that keeps the boring-crap-grinders from being able to take my job?


Exactly, but I don’t think you lose much edge, or anything that can’t be picked up again quickly if it’s truely boring easy stuff. I think it’s a net positive because I can guarantee you there are afternoons where if I couldn’t have done the boring thing with AI I just wouldn’t have done it that afternoon at all haha.


My personal project output has gone up dramatically since I started using AI, because I can now use times of night where I'm otherwise too mentally tired, to work with AI to crank through a first draft of a change that I can then iterate on later. This has allowed me to start actually implementing side projects that I've had ideas about for years and build software for myself in a way I never could previously (at least not since I had kids).

I know it's not some amazing GDP-improving miracle, but in my personal life it's been incredibly rewarding.


This, 1000x.

I had a dozen domains and projects on the shelf for years and now 8 of them have significant active development. I've already deployed 2 sites to production. My github activity is lighting up like a Christmas tree.


i find a lot of value in using it to give half baked ideas momentum. some sort of "shower thought" will occur to me for a personal project while im at work and ill prompt Claude code to analyze and demonstrate an implementation for review later

on the other hand i believe my coworker may have taken it too far. it seems like productivity has significantly slipped. in my perception the approaches hes using are convoluted and have no useful outcome. im almost worried about him because his descriptions of what hes doing make no sense to me or my teammates. hes spending a lot of time on it. im considering telling him to chill out but who knows, maybe im just not as advanced a user as him? anyone have experience with this?


Do you mean like convoluted agentic stuff, markdown files etc? Or like AI delusion?


the former

it started as an approach to a mass legacy code migration. sound idea with potential to save time. i followed along and understood his markdown and agent stuff for analyzing and porting legacy code

i reviewed results which apply to my projects. results were mixed bag but i think it saved some time overall. but now i dont get where hes going with his ai aspirations

my best attempt to understand is he wants to work entirely though chats, no writing code, and hes doing so by improving agents through chats. hes really swept up in the entire concept. i consider myself optimistic about ai but his enthusiasm feels misplaced

its to the point where his work is slipping and management is asking him where his results are. were a small team and management isnt savvy enough to see hes getting NOTHING done and i wont sell him out. however if this is a known delusional pattern id like to address it and point to a definition and/or past cases so he can recognize the pattern and avoid trouble


I really haven't tried that stuff myself except for claude code

but I do recall seeing some Amazon engineer who worked on Amazon q and his repos and they were... something.

like making PRs that were him telling the ai that "we are going to utilize the x principle by z for this" and like 100s of lines of "principles" and stuff that obviously would just pollute the context and etc.

like huge amounts of commits but it was just all this and him trying to basically get magic working or something.

and to someone like me it was obvious that this was a futile effort but clearly he didn't seem to quite get it.

I think the problem is that people don't understand transformers, that they're basically huge datasets in a model form where it'll auto-generated based on queries from the context (your prompts and the models reponses)

so you basically are just getting mimicked responses

which can be helpful but I have this feeling that there's a fundamental limit, like a mathematical one where you can't get it really to do stuff unless you provide the solution itself in your prompt, that covers everything because otherwise it'd have to be in its training data (which it may have, for common stuff like boilerplate, hello world etc.)

but maybe I'm just missing something. maybe I don't get it

but I guess if you really wanna help him, I'd maybe play around with claude/gpt and see how it just plays along even if you pretend, like you're going along with a really stupid plan or something and how it'll just string you along

and then you could show him.

Orr.... you could ask management to buy more AI tools and make him head of AI and transition to being an AI-native company..


hes already started that last step and has work paying for his pro plan

you put it nicely when you mention a fundamental limit and will borrow that if i think hes wasting a risky amount of time

i really like the sibling idea of having him try to explain again, then use claude to explain if he cant

genuine thanks to you and sibling for offering advice


I don't know about 'delusion pattern', but a common problem is that the AI wants to be helpful, and the AI is sycophantic, so when you are going down an unproductive path the AI will continue to help you and reinforce whatever you are feeling. This can be very hard to notice if you are an optimistic person who is process oriented, because you can keep working on the process forever and the AI will keep telling you that it is useful. The problem with this of course is that you never know if you are actually creating anything useful or not without real human feedback. If he can't explain what he is doing adequately then ask him to have the AI do it and read that. You should figure out pretty quickly if it is bullshit or not. If he can't get the AI to tell you what it is he is doing, and he can't explain it in a way that makes sense, then alarm bells should ring.


Yesterday is a good example- in 2 days, I completed what I expected to be a week’s worth of heads-down coding. I had to take a walk and make all new goals.

The right AI, good patterns in the codebase and 20 years of experience and it is wild how productive I can be.

Compare that to a few years ago, when at the end of the week, it was the opposite.


makes no sense. how are you comparing yesterday with a "a few years ago" ?


The "you only think you're more productive" argument is tiresome. Yes, I know for sure that I'm more productive. There's nothing uncertain about it. Does it lead to other problems? No doubt, but claiming my productivity gains are imaginary is not serious.

I've seen a lot of people who previously touted that it doesn't work at all use that study as a way to move the goalpost and pretend they've been right all along.


I would be interested to know how you measure your productivity gains though, in an objective way where you're not the victim of bias.

I just recently had to rate whether I felt like I got more done by leaning more on Claude Code for a week to do a toy project and while I _feel_ like I was more productive, I was already biased to think so, and so I was a lot more careful with my answer, especially as I had to spend a considerable amount of time either reworking the generated code or throwing away several hours of work because it simply made things up.


It sounds like you're very productive without AI or that your perceived gains are pretty small. To me, it's such a stark contrast that asking how I measure it is like asking me to objectively verify that a car is faster than walking.


“I'm eating fewer calories yet keep putting on weight.”

There's a reason self-reported measures are questioned: they have been wildly off in different domains. Objectively verifying that a car is faster than walking is easy. When it's not easy to objectively prove something, then there are a lot that could go wrong, including the disagreements on the definition of what's being measured.


You're talking about cases where the measured productivity gains were marginal. Claiming my individual productivity gains are imaginary is simply absurd. I know I am more productive and it's a fact.

Again, people who were already highly productive without AI won't understand how profound the increase is.


Well said, people keep acting like one study that has issues can be quoted at me and it somehow erases the fact that I’ve seen simply undeniable productivity gains, drives me mad. I get the feeling no measurement system would satiate them anyway as their intent is to undermine you because emotionally they’re not ready to accept the usefulness of LLMs.

If I showed them time gains, they’d just say “well you don’t know how much tech debt you’re creating”, they’d find a weasel way to ignore any methodology we used.

If they didn’t, they wouldn’t be conveniently ignoring all but that one study that is skeptical of productivity gains.


I built a financial trading app in a couple of month, it would have taken 2 - 3 years without AI, at least. Maybe I would have never finsihed because I would have given up some time because of too much effort etc.

So - this thing would never be in existance and work without a 20 USD ClaudeAI subscription :)


OK, so it sounds like this is a 'I know for certain I can't code without AI and that I get nothing coherent done, and now I'm doing the greatest things all the time, so you're demonstrably wrong!' argument.

I would ask, then, if you're qualified to evaluate that what 'you' are doing now is what you think it is? Writing off 'does it lead to other problems' with 'no doubt, but' feels like something to watch closely.

I imagine a would-be novelist who can't write a line. They've got some general notions they want to be brilliant at, but they're nowhere. Apply AI, and now there's ten million words, a series, after their continual prompts. Are they a novelist, or have they wasted a lot of time and energy cosplaying a novelist? Is their work a communication, or is it more like an inbox full of spam into which they're reading great significance because they want to believe?


To be honest, if that person spent time coming up with world building, plot ideas etc all themselves, got the ai to draft stuff and edited it continuously until they got the exact result they wanted in the form of a novel, then yeah I would say they’re a novelist.

You can currently go to websites and use character generators and plot idea generators to get unstuck from writers block or provide inspiration and professional writers already do this _all the time_.


No, incorrect observation. I've been programming professionally longer than I've had my HN account.


We’re being accused of false consciousness!


I'm not who you responded to. I see about a 40% to 60% speed up as a solution architect when I sit down to code and about a 20% speedup when building/experimenting with research artifacts (I write papers occasionally).

I have always been a careful tester, so my UAT hasn't blown up out of proportion.

The big issue I see is rust it generates code using 2023-recent conventions, though I understand there is some improvement in thst direction.

Our hiring pipeline is changing dramatically as well, since the normal things a junior needs to know (code, syntax) is no longer as expensive. Joel Spolsky's mantra to higher curious people who get things done captures well the folks I find are growing well as juniors.


I'm objectively faster. Not necessarily if I'm working on a task I've done routinely for years, but when taking on new challenges I'm up and running much faster. A lot of it have to do with me offloading doing the basic research while allowing myself to be interrupted; it's not a problem that people reach out with urgent matters while I'm taking on a challenge I've only just started to build towards. Being able to correct the ai where I can tell it's making false assumptions or going off the rails helps speed things up


I have a very big hobby code project I’ve been working on for years.

AI has not made me much more productive at work.

I can only work on my hobby project when I’m tired after the kids go to bed. AI has made me 3x productive there because reviewing code is easier than architecting. I can sense if it’s bad, I have good tests, the requests are pretty manageable (make a new crud page for this DTO using app conventions).

But at work where I’m fresh and tackling hard problems that are 50% business political will? If anything it slows me down.


Bureaucracy is often a bottleneck


We're choosing to offload processing that our brain could be doing but we're too lazy to do it or the perceived value for us to do it is. I think there are consequences to this especially as we give the machine free information of how we twist and turn it into actually understanding what we mean.

Interesting to consider that if our first vibecode prompt isn't what we actually want; it can train on how we direct it further.

Offloading human intelligence is useful but... we're losing something.


The majority of people seem to offload most of their thinking as is, and actively avoid things like critical thinking, confronting their own biases, seeking push-back against their own beliefs, etc.

As with many other technologies, AI can be an enabler of this, or it can be used as a tool to empower and enhance learning and personal growth. That ultimately depends on the human to decide. One can dramatically accelerate personal and professional growth using these tools.

Admittedly the degree to which one can offload tasks is greatly increased with this iteration, to the extent that at times you can almost seem like offloading your own autonomy. But many people already exist in this state, exclusively parroting other people's opinions without examining them, etc.


Yeah real talk. It can really play to folks bias towards laziness and the fact that it's being controlled by corporations / (the few) / (the ultra-wealthy) should give us pause to consider the level of control it does and will influence over the majority of people...


The design of that study is pretty bad, and as a result it doesn't end up actually showing what it claims to show / what people claim it does.

https://www.fightforthehuman.com/are-developers-slowed-down-...


I don't think there is anything factually wrong with this criticism, but it largely rehashes caveats that are already well explored in the original paper, which goes through unusual lengths to clearly explain many ways the study is flawed.

The study gets so much attention since it's one of the few studies on the topic with this level of rigor on real-world scenarios, and it explains why previous studies or anecdotes may have claimed perceived increases in productivity even if there wasn't any actual increases. It clearly sets a standard that we can't just ask people if they felt more productive (or they need to feel massively more productive to clearly overcome this bias).


> it largely rehashes caveats that are already well explored in the original paper, which goes through unusual lengths to clearly explain many ways the study is flawed. ... The study gets so much attention since it's one of the few studies on the topic with this level of rigor on real-world scenarios,

Yes, but most people don't seem aware of those caveats, and this is a good summary of them, and I think it does undercut the "level of rigour" of the study. Additionally, some of what the article points out is not explicitly acknowledged and connected by the study itself.

For instance, if you actually split up the tasks by type, some tasks show a speed up and some show a slowdown, and the qualitative comments by developers about where they thought AI was good/bad aligned very well with which saw what results.

Or (iirc) the fact that the task timing was per task, but developer's post hoc assessments were a prediction of how much they thought they were sped up on average across all tasks, meaning it's not really comparing the same things when comparing how developers felt vs how things actually went.

Or the fact that developers were actually no less accurate in predicting times to task completion overall wrt to AI vs non-AI.

> and it explains why previous studies or anecdotes may have claimed perceived increases in productivity even if there wasn't any actual increases.

Framing it that way assumes as an already established fact that needs to be explained that AI does not provide more productivity Which actually demonstrates, inadvertently, why the study is so popular! People want it to be true, so even if the study is so chock full of caveats that it can't really prove that fact let alone explain it, people appeal to it anyway.

> It clearly sets a standard that we can't just ask people if they felt more productive

Like we do for literally every other technological tool we use in software?

> (or they need to feel massively more productive to clearly overcome this bias).

All of this assumes a definition of productivity that's based on time per work unit done, instead of perhaps the amount of effort required to get a unit of work done, or the extra time for testing, documentation, shoring up edge cases, polishing features, that better tools allow. Or the ability to overcome dread and procrastination that comes from dealing with rote, boilerplate tasks. AI makes me so much more productive that friends and my wife have commented on it explicitly without needing to be prompted, for a lot of reasons.


I think its Apples to Oranges.

The performance gains come from being able to ask specific questions about problems I deal with and (basically) have a staff engineer that I can bounce ideas off of.

I am way faster at writing tasks on problems I am familiar with vs an AI.

But me trying to figure out the database I should deeply look at for my usecase or debug android code when I don't know kotlin has saved me 5000x time.


The thing is, most programming is mundane. Rename files, move them, make sure imports are correct, make sure builds pass...AI can do all of these with very high accuracy (most of the time).


Yes, for me.

Instead of getting overwhelmed doing to many things, I can offload a lot of menial and time-driven tasks

Reviews are absolutely necessary but take less time than creation


Exactly. Is HN full of old codgers demanding that we can’t possibly use a calculator because that might mean we’d lose the precious skill of how to use a slide rule? The old man energy in here is insane


If you want another data point, you can just look at my company github (https://github.com/orgs/sibyllinesoft/repositories). ~27 projects in the last 5 weeks, probably on the order of half a million lines of code, and multiple significant projects that are approaching ship readiness (I need to stop tuning algorithms and making stuff gorgeous and just fix installation/ensure cross platform is working, lol).


I don't do Rust or Javascript so I can't judge, but I opened a file at random and feel like the commenting probably serves as a good enough code smell.

From the one random file I opened:

/// Real LSP server implementation for Lens pub struct LensLspServer

/// Configuration for the LSP server

pub struct LspServerConfig

/// Convert search results to LSP locations

async fn search_results_to_locations()

/// Perform search based on workspace symbol request

async fn search_workspace_symbols()

/// Search for text in workspace

async fn search_text_in_workspace()

etc, etc, etc, x1000.

I don't see a single piece of logic actually documented with why it's doing what it's doing, or how it works, or why values are what they are, nearly 100% of the comments are just:

function-do-x() // Function that does x


Sure, this is a reasonable point, but understand that documentation passes come late, because if you do heavy documentation refinement on a product under feature/implementation drift you just end up with a mess of stale docs and repeated work.


Early coding agents wanted to do this - comment every line of code. You used to have to yell at them not to. Now they’ve mostly stopped doing this at all.


Lines of codes is not a measure of anything meaningful on its own. The mere fact that you suggest this as prove that you are more productive makes me think you are not.


The SWE industry is eagerly awaiting your proposed accurate metric.

I find that people who dismiss LoC out of hand without supplying better metrics tend to be low performers trying to run for cover.


You're new to the industry, aren't you?


> low performers trying to run for cover

Oh no, you've caught me.

On a serious note: LoC can be useful in certain cases (e.g. to estimate the complexity of a code base before you dive in, even though it's imperfect here, too). But, as other have said, it's not a good metric for the quality of a software. If anything, I would say fewer LoC is a better indication of high quality software (but again, not very useful metric).

There is no simple way to just look at the code and draw conclusions about the quality or usefulness of a piece of software. It depends on sooo many factors. Anybody who tells you otherwise is either naive or lying.


> The SWE industry is eagerly awaiting your proposed accurate metric.

There are none. All are various variant of bad. LoC is probably the worst metric of all. Because it says nothing about quality, or features, or number of products shipped. It's also the easiest metric to game. Just write GoF-style Java, and you're off to the races. Don't forget to have a source code license at the beginning of every file. Boom. LoC.

The only metrics that barely work are:

- features delivered per unit of time. Requires an actual plan for the product, and an understanding that some features will inevitably take a long time

- number of bugs delivered per unit of time. This one is somewhat inversely correlated with LoC and features, by the way: the fewer lines of code and/or features, the fewer bugs

- number of bugs fixed per unit of time. The faster bugs are fixed the better

None of the other bullshit works.


A metric I'd be interested in is the number of clients you can convince to use this slop.


That's a sales metric brother.


I understand that you would prefer to be more “productive” with AI but without any sales than be less productive without AI but with sales.

To clarify, people critical of the “productivity increase” argument question whether the productivity is of the useful kind or of the increased useless output kind.


Loc is so easy to game. Reformat. Check in a notebook. Move things around. Pointless refactor.

If nobody is watching loc, it’s generally a good metric. But as soon as people start valuing it, it becomes useless.



First off, congrats on the progress.

Second, as you seem to be an entrepreneur, I would suggest you consider adopting the belief that you've not been productive until the thing's shipped into prod and available for purchase. Until then you've just been active.


Sooo you launched https://sibylline.dev/, which looks like a bunch of AI slop, then spun up a bunch of GitHub repos, seeded them with more AI slop, and tout that you're shipping 500,000 lines of code?

I'll pass on this data point.


[flagged]


I mean, you're slinging insults so it's hard for me agree he's the toxic person in this conversation...


So your company is actively shipping tens of thousands of AI-generated lines of code?


It seems like the programming world is increasingly dividing into “LLMs for coding are at best marginally useful and produce huge tech debt” vs “LLMs are a game changing productivity boost”.

I truly don’t know how to account for the discrepancy, I can imagine many possible explanations.

But what really gets my goat is how political this debate is becoming. To the point that the productivity-camp, of which I’m a part, is being accused of deluding themselves.

I get that OpenAI has big ethical issues. And that there’s a bubble. And that ai is damaging education. And that it may cause all sorts of economic dislocation. (I emphatically Do Not get the doomers, give me a break).

But all those things don’t negate the simple fact that for many of us, LLMs are an amazing programming tool, and we’ve been around long enough to distinguish substance from illusion. I don’t need a study to confirm what’s right in front of me.


I’d love to know whether and to what extent the people for which AI has been a huge boost are those who were already producing slop, and now they have AI that can produce that slop much faster.


By framing it in such a way you are making personal and it becomes less about defending the process of using AI and more about defending their integrity as a developer. You will not get anything useful when someone responds in that way, just a heated argument where they feel deeply insulted and act accordingly.


Well said


Sheesh, how do you expect to be taken seriously when you sneer through gritted teeth like that?

I work with many developers of varying skill levels, all of which use AI. The only ones who have attempted to turn in slop are ones that basically turned out that they can’t code at all and didn’t keep their job long. Those who know what they’re doing, use it as a TOOL. They carefully build, modify, review and test everything and usually write about half of it themselves and it meets our strict standards.

Which you would know if you’d listened to what we’ve been telling you in good faith.


Data point: I run a site where users submit a record. There was a request months ago to allow users to edit the record after submitting. I put it off because while it's an established pattern it touches a lot of things and I found it annoying busy work and thus low priority. So then gpt5-codex came out and allowed me to use it in codex cli with my existing member account. I asked it to support edit for that feature all the way through the backend with a pleasing UI that fit my theme. It one-shotted it in about ten minutes. I asked for one UI adjustment that I decided I liked better, another five minutes, and I reviewed and released it to prod within an hour. So, you know, months versus an hour.


Is the hour really comparable to months spent not working on it?


He's referring to the reality that AI helps you pick up and finish tasks that you otherwise would have put off. I see this all day every day with my side projects as well as security and customer escalations that come into my team. It's not that Giant Project X was done six times as fast. It's more like we were able to do six small but consequential bug fixes and security updates while we continued to push on the large feature.


“If you make a machine that can wash the dishes in an hour, is that more productive than not doing the dishes for months?” - Yes! That’s the definition of productivity!! The dishes are getting done now and they weren’t before! lol


There have been plenty of studies showing the opposite. Also a sample size of 16 ain’t much




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: