Hacker Newsnew | past | comments | ask | show | jobs | submit | terabytest's commentslogin

> FastRender may not be a production-ready browser, but it represents over a million lines of Rust code, written in a few weeks, that can already render real web pages to a usable degree

I feel that we continue to miss the forest for the trees. Writing (or generating) a million lines of code in Rust should not count as an achievement in and of itself. What matters is whether those lines build, function as expected (especially in edge cases) and perform decently. As far as I can tell, AI has not been demonstrated to be useful yet at those three things.


100%. An equivalent situation would be:

Company X does not have a production-ready product, but they have thousands of employees.

I guess it could be a strange flex about funding but in general it would be a bad signal.


Absolutely.

I think some of these people need to be reminded of the Bill Gates' quote about lines of code:

“Measuring programming progress by lines of code is like measuring aircraft building progress by weight.”


SLOC was a bad indicator 20 years ago and it is today. Don't tell them - once they realize it's a red flag for us they will use some other metric, because they fight for our attention.

> because they fight for our attention.

Not only that, they straight up pay people to just share and write about their thing: https://i.imgur.com/JkvEjkT.png

Most of us probably knew this already, the internet had paid content for as long as I can remember, but I (naively perhaps) thought that software developers and especially Hacker News was more resilient to it, but I think all of us have to get better at not trusting what we read, unless it's actually substantiated.


I don't understand, what does that screenshot show? That there exists at least one anonymous Chinese company that has offered someone $200 to post about them on HN? Why is that relevant to a conversation about Cursor?

Who are the "they" in "they straight up pay people"?


Read the parent comment first then mine, if you haven't, and it should make sense. Otherwise; "Them" here is referring to "AI companies wanting to market their products". The screenshot shows one such attempt of a company wanting to pay someone on HN to talk and share their product in return of compensation for that. Proof that "They" aren't just "fighting for our attention" in the commonly understood way, they're also literally paying money to talk about them.

Line count also becomes less useful of a metric because LLM generated code tends to be unnecessarily verbose.

Makes me wonder what would happen if you introduce cyclomatic complexity constraints in your agents.md file.

well, first the model has to actually follow the instructions in agents.md =)

I think you missed the point. From the blog post:

To test this system, we pointed it at an ambitious goal: building a web browser from scratch. The agents ran for close to a week, writing over 1 million lines of code across 1,000 files [...]

Despite the codebase size, new agents can still understand it and make meaningful progress. Hundreds of workers run concurrently, pushing to the same branch with minimal conflicts.

The point is that the agents can comprehend the huge amount of code generated and continue to meaningfully contribute to the goal of the project. We didn't know if that was possible. They wanted to find out. Now we have a data point.

Also, a popular opinion on any vibecoding discussion is that AI can help, but only on greenfield, toy, personal projects. This experiment shows that AI agents can work together on a very complex codebase with ambitious goals. Looks like there was a human plus 2,000 agents, in two months. How much progress do you think a project with 2,000 engineers can achieve in the first two months?

> What matters is whether those lines build, function as expected (especially in edge cases) and perform decently. As far as I can tell, AI has not been demonstrated to be useful yet at those three things.

They did build. You can give it a try. They did function as expected. How many edge cases would you like it to pass? Perform decently? How could you tell if you didn't try?


That’s not what I meant. What I’m asking is whether there’s any evidence that the latest “techniques” (such as Ralph) can actually lead to high quality results both in terms of code and end product, and if so, how.

I used Ralph recently, in Claude Code. We had a complex SQL script that was crunched large amounts of data and was slow to run even on tables that are normalized, have indexes for the right columns etc. We, the humans spent significant amount of time tweaking it. We were able to get some performance gains, but eventually hit a wall. That is when I let Ralph take a stab at it. I told it to create a baseline benchmark and I gave it the expected output. I told to keep iterating on the script until there was at least 3x improvement in performance number while the output was identical. I set the iteration limit to 50. I let it loose and went to dinner. When I came back, it had found a way to get 3x performance and stopped on the 20th iteration.

Is there another human that could get me even better performance given the same parameters. Probably yes. In the same amount of time? Maybe, but unlikely. In any case, we don't have anybody on our team that can think of 20 different ways to improve a large and complex SQL script and try them all in a short amount of time.

These tools do require two things before you can expect good results:

1. An open mind. 2. Experience. Lots of it.

BTW, I never trust the code an AI agent spits out. I get other AI agents, different LLMs, to review all work, create deterministic tests that must be run and must pass before the PR is ever generated. I used to do a lot of this manually. But now I create Claude skills that automate a lot of this away.


I don't understand what kind of evidence you expect to receive.

There are plenty of examples from talented individuals, like Antirez or Simonw, and an ocean of examples from random individuals online.

I can say to you that some tasks that would take me a day to complete are done in 2h of agentic coding and 1h of code review, with the additional feature that during the 2h of agenti coding I can do something else. Is this the kind of evidence you are looking for?


This is exactly the issue I have with what I'm seeing around: lots of "here's something impressive we did" but nearly nothing in terms of how it was actually achieved in clear, reproducible detail.

Your point is fair, but it rests on a major assumption I'd question: that the only limit lies with the user, and the tooling itself has none. What if it’s more like “you can’t squeeze blood from a stone”? That is, agentic coding may simply have no greater potential than what I've already tried. To be fair I haven't gone all the way in trying to make it work but, even if some minor workarounds exist, the full promise being hyped might not be realistically attainable.

How can one judge potential without fully understanding or having used it to its full potential?

I don’t think agentic programming is some promised land of instant code without bugs.

It’s just a force multiplier for what you can do.


The point is precisely this. How do you know you have used it to its full potential? "You're holding it wrong" has no limits.

So far I've found https://github.com/jae-jae/fetcher-mcp which mostly does what I want, but it only started working well when I asked Codex to run it with `disableMedia: false`.


How do you prevent DoS attacks?


Cloudflare, rate limiting, and other limits.


This is about Groq (the semiconductor company), not Grok (xAI’s LLM).


If nothing else this deal will probably sunset this unending point of confusion lmao.


No experience in the field, other than 2048, so take this with a grain of salt.

In my opinion it’s about your ethical stance and who your target audience is, and whether you’re trying to make a ton of money or just enough to survive. You’re obviously going to fight an uphill battle if you don’t employ any such (predatory?) marketing tactics. However, you could position yourself as explicitly standing against those and that might attract a smaller but loyal user base.

If you’re lucky, and build something good, and people talk about it, you might find that you’ll get users regardless. However, at the end of the day, what matters is whether you can keep the lights on, so you may have to relax some of your stances and rules or find ways to market your product that don’t fall into the categories you’ve described.


How does Opus 4.5 compare to gpt-5.1-codex-max?


roughly, much better: https://www.swebench.com


This one’s hilarious. What was your prompt?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: