What example do you need? In every single benchmark AI is getting better and bet...

jacobsenscott · 2025-12-09T01:51:08 1765245068

Whenever I try and use a "state of the art" LLM to generate code it takes longer to get a worse result than if I just wrote the code myself from the start. That's the experience of every good dev I know. So that's my benchmark. AI benchmarks are BS marketing gimmicks designed to give the appearance of progress - there are tremendous perverse financial incentives.

This will never change because you can only use an LLM to generate code (or any other type of output) you already know how to produce and are expert at - because you can never trust the output.

whycombinetor · 2025-12-09T02:03:16 1765245796

Third party benchmarks like terminalbench exist.

W.r.t code changes especially small ones (say 50 lines spread across 5 files), if you can't get an agent to make nearly exactly the code changes you want, just faster than you, that's a you problem at this point. If it maybe would take you 15 minutes, grok-code-fast-1 can do it in 2.

trollbridge · 2025-12-09T02:03:23 1765245803

Right. With careful use of AIs, I can use it to gather information to help me make better designs (like giving me summaries of the current best available frameworks or libraries to choose for a given project), but as far as just generating an architecture and then generating the code and devops and so on for that? It's just not there, unless you're creating an app that effectively already exists, like some basic CRUD app.

If you're creating basic CRUDs, what on earth are you doing? That kind of thing should have been automated a long time ago.

whycombinetor · 2025-12-09T02:29:22 1765247362

What do you mean when you say building crud apps should be automated?

trollbridge · 2025-12-09T04:07:56 1765253276

CRUD apps are ridiculously simple and have been in existence my entire life. Yet it is surprisingly difficult to make a basic CRUD and host it somewhere. The bulk of useful but simple business apps are just a CRUD with a tiny bit of customisation and integration around them.

It is true that LLMs make it easier to build these kind of things without having to become a competent programmer first.

lomase · 2025-12-09T09:31:40 1765272700

I don't know what kind of CRUD apps you work on. The kind of CRUD apps people pay me to work on are not simple.

beeflet · 2025-12-09T02:42:03 1765248123

conventionally, it should have been abstracted by a higher-level language.

machomaster · 2025-12-09T04:25:01 1765254301

E.g using Rails and generate scaffolding. Makes it real fast and easy to make a CRUD app.

fzeroracer · 2025-12-09T07:09:37 1765264177

AI is getting better at every benchmark. Please ignore that we're not allowed to see these benchmarks and also ignore that the companies in question are creating the benchmarks that are being exceeded.

azemetre · 2025-12-09T22:56:54 1765321014

What metrics, that aren't controlled by industry, show AI getting better? Generally curious because those "ranking sites" to me seem to be infested with venture capital, so hardly fair or unbiased. The only reports I hear from academia are those being overly negative on AI.

bluefirebrand · 2025-12-09T01:52:23 1765245143

> please name what metric you think is meaningful

Job satisfaction and human flourishing

By those metrics, AI is getting worse and worse

machomaster · 2025-12-09T04:29:40 1765254580

AI is very satisfied in doing the job, just ask it.

AI is able to speed up the progress, to give more resources, to give the most important thing people have - time. The fact that these incredible gifts are misused (or used inefficiently) is not the problem of AI. This would be like complaining that the objective positive of increased food production is actually a negative, because people are getting fatter.

bluefirebrand · 2025-12-09T17:05:02 1765299902

> AI is very satisfied in doing the job, just ask it

I could not care less about AI's satisfaction in anything

lomase · 2025-12-09T09:35:59 1765272959

Imagine anthropomorphing this hard.

machomaster · 2025-12-09T12:48:56 1765284536

You misunderstood. This is how the conversation went:

1. Is there steady progress in AI?

2. What example do you need? In every single benchmark AI is getting better and better.

3. Job satisfaction and human flourishing.

Hence my answer "AI is very satisfied in doing the job, just ask it". It came about because of the stupid comment 3, which tried to link and put a blame on unrelatable things (akin to refering to obesity when asked what metrics make him say that agriculture/transportation have not made progress in the last 100 years) and at the same time anthropomorphed AI. I only accepted the premise and continued answering on the same level in order to demonstrate stupidity of their answer.

yeasku · 2025-12-10T03:44:30 1765338270

I did not misunderstood anything clanker.

machomaster · 2025-12-10T19:19:17 1765394357

I don't even know who you are.

I was answering user "lomase".

philipwhiuk · 2025-12-09T02:52:30 1765248750

OpenAI net profit.

The figures for cost are wildly off to start with.