Hacker Newsnew | past | comments | ask | show | jobs | submit | dhorthy's commentslogin

I read it. i agree this is out of touch. Not because the things its saying are wrong, but because the things its saying have been true for almost a year now. They are not "getting worse" they "have been bad". I am staggered to find this article qualifies as "news".

If you're going to write about something that's been true and discussed widely online for a year+, at least have the awareness/integrity to not brand it as "this new thing is happening".


Perhaps the advertising money from the big AI money sinks is running out and we are finally seeing more AI scepticism articles.

> They are not "getting worse" they "have been bad".

The agents available in January 2025 were much much worse than the agents available in November 2025.


Yes, and for some cases no.

The models are gotten very good, but I rather have an obviously broken pile of crap that I can spot immediately, than something that is deep fried with RL to always succeed, but has subtle problems that someone will lgtm :( I guess its not much different with human written code, but the models seem to have weirdly inhuman failures - like, you would just skim some code, cause you just cant believe that anyone can do it wrong, and it turns out to be.


That's what test cases are for, which is good for both humans and nonhumans.

Test cases are great, but not a total solution. Can you write a test case for the add_numbers(a, b) function?

Well, for some reason it doesnt let me respond to the child comments :(

The problem (which should be obvious) is that with a/b real you cant construct an exhaustive input/output set. The test case can just prove the presence of a bug, but not its absence.

Another category of problems that you cant just test and have to prove is concurrency problems.

And so forth and so on.


Of course you can. You can write test cases for anything.

Even an add_numbers function can have bugs, e.g. you have to ensure the inputs are numbers. Most coding agents would catch this in loosely-typed languages.


I mean "have been bad" doesnt exclude "getting worse" right :)

engineers always want to re write from scratch and it never works.

a tale as old as time - my second job out of college back in like 2016, I landed at the tail end of a 3-month feature-freeze refactor project. was pitched to the CEO as 1-month, sprawled out to 3 months, still wasn't finished. Non-technical teams were pissed, technical teams were exhausted, all hope was lost. Ended up cutting a bunch of scope and slopping out a bunch of bugs anyway.


i had the privilege of working w/ some incredible eng leaders at my previous gig - they were very good at working both upwards and downwards to execute against the "50/50" rule - half of any given sprint's work is focused on new features, and half is focused on bug fixes, chores, things that improve team velocity.

the people yearn for refactoring

I think the key here is “if X then Y syntax” - this seems to be quite effective at piercing through the “probably ignore this” system message by highlighting WHEN a given instruction is “highly relevant”


What?


It helps when questions intended to resolve ambiguity are not themselves hopelessly ambiguous.

See also: "Help me help you" - https://en.wikipedia.org/wiki/Jerry_Maguire


agree - i've had claude one-shot this for me at least 10 times at this point cause i'm too lazy to lug whatever code around. literally made a new one this morning


For the record I do think the AI community tries to unnecessarily reinvent the wheel on crap all the time.

sure, readme.md is a great place to put content. But there's things I'd put in a readme that I'd never put in a claude.md if we want to squeeze the most out of these models.

Further, claude/agents.md have special quality-of-life mechanics with the coding agent harnesses like e.g. `injecting this file into the context window whenever an agent touches this directory, no matter whether the model wants to read it or not`

> What people often forget about LLMs is that they are largely trained on public information which means that nothing new needs to be invented.

I don't think this is relevant at all - when you're working with coding agents, the more you can finesse and manage every token that goes into your model and how its presented, the better results you can get. And the public data that goes into the models is near useless if you're working in a complex codebase, compared to the results you can get if you invest time into how context is collected and presented to your agent.


> For the record I do think the AI community tries to unnecessarily reinvent the wheel on crap all the time.

On Reddit's LLM subreddits people are rediscovering the very basics of software project management as some massive insights daily or very least weekly.

Who would've guessed that proper planning, accessible and up to documentation and splitting tasks into manageable testable chunks produces good code? Amazing!

Then they write a massive blog post or even some MCP mostrosity for it and post it everywhere as a new discovery =)


I can totally understand where you are coming from with this comment. It does feel a bit frustrating that people are rediscovering things that were written in books 30/40/50 years ago.

However, I think this is awesome for the industry. People are rediscovering basic things, but if they didn't know about the existing literature this is a perfect opportunity to refer them to it. And if they were aware, but maybe not practicing it, this is a great time for the ideas to be reinforced.

A lot of people, myself included, never really understand which practices are important or not until we were forced to work on a system that was most definitely not written with any good practices in mind.

My current view of agentic coding is that it's forcing an entire generation of devs to learn software project management or drowning under the mountain of debt an LLM can produce. Previously it took much longer to feel the weight of bad decisions in a project but an LLM allows you to speed-run this process in a few weeks or months.


> If I tell them exactly how to build something the work needed to review the resulting changes is a whole lot less taxing.

Totally matches my experience- the act of planning the work, defining what you want and what you don’t, ordering the steps and declaring the verification workflows—-whether I write it or another engineer writes it, it makes the review step so much easier from a cognitive load perspective.


I have had a user story and a research plan and only realized deep in the implementation that a fundamental detail about how the code works was missing (specifically, that types and sdks are generated from OpenAPI spec) - this missing meant the plan was wrong (didn’t read carefully enough) and the implementation was a mess


Yeah I agree. There's a lot more needed than just the User Story, one way I'm thinking about it is that the "core" is deliverable business value, and the "shells" are context required for fine-grained details. There will likely need to be a step to verify against the acceptance criteria.

I hope to back up this hypothesis with actual data and experiments!


My advice - never use compact, always stash your context to Md or a wordy git commit message and then clear context

You want control over and visibility into what’s being compacted, and /compact doesn’t do great on either


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: