I'm not sure what you're using. I've used Claude in agent mode to port a very complex and spaghetti coded C application to nicely structured C++. The original code was so intertwined that I didn't want to figure out so I had shelved the project until AI came along.
It wasn't super bad at converting the code but even it struggled with some of the logic. Luckily, I had it design a test suite to compare the outputs of the old application and the new one. When it couldn't figure out why it was getting different results, it would start generating hex dumps comparisons, writing small python programs, and analyzing the results to figure out where it had gone wrong. It slowly iterated on each difference until it had resolved them. Building the code, running the test suite, comparing the results, changing the code, repeat. Some of the issues are likely bugs in the original code (that it fixed) but since I was going for byte-for-byte perfection it had to re-introduce them.
The issues you describe I have seen but not with the right technology and not in a while.
At the high level, you asked LLM to translate N lines of code to maybe 2N lines of code, while GP asked LLM to translate N lines of English to possibly 10N lines of code. Very different scenarios.
The OP said the LLM didn't build anything, said it was great, and didn't even compile it. My experience has been far the opposite: not only compiling it and fixing compile time errors but also running it and fixing runtime issues as well. Even going so far as to write waveform analysis tools in Python (the output of this project was WAV files) to determine the issues.
It doesn't really matter what we told it do; a task is a task. But clearly how each LLM performed that task very different for me than the OP.
I'll be the first to say I've abandoned a chat and started a new one to get the result I want. I don't see that as a net negative though -- that's just how you use it.
Are you sure claude didn't do exactly the same thing but the harness, claude code, just hid it from you?
I have seen AI agents fall into the exact loop that GP discussed and needed manual intervention to fall out of.
Also blindly having the AI migrate code from "spaghetti C" to "structured C++" sounds more like a recipe for "spaghetti C" to "fettuccine C++".
Sometimes its hidden data structures and algorithms you want to formalize when doing a large scale refactor and I have found that AIs are definitely able to identify that but it's definitely not their default behaviour and they fall out of that behaviour pretty quickly if not constantly reminded to do so.
> Are you sure claude didn't do exactly the same thing but the harness, claude code, just hid it from you?
What do you mean? Are you under the impression I'm not even reading the code? The code is actually the most important part because I already have working software but what I want is working software that I can understand and work with better (and so far, the results have been good).
Reading the code and actually understanding the code is not that the same thing.
"This looks good", vs "Oh that is what this complex algorithm was" is a big difference.
Effectively, to review that the code is not just being rewritten into the same code but with C++ syntax and conventions means you need to understand the original C code, meaning the hard part was not the code generation (via LLM or fingers) but the understanding and I'm unsure the AI can do the high level understanding since I have never gotten it to produce said understanding without explicitly telling it.
Effectively, "x.c, y.c, z.c implements a DSL but is convoluted and not well structured, generate the same DSL in C++" works great. "Rewrite x.c, y.c, z.c into C++ buildings abstractions to make it more ergonomic" generally won't recognise the DSL and formalise it in a way that is very easy to do in C++, it will just make it "C++" but the same convoluted structure exists.
> Reading the code and actually understanding the code is not that the same thing.
Ok. Let me be more specific then. I'm "understanding" the code since that's the point.
> I'm unsure the AI can do the high level understanding since I have never gotten it to produce said understanding without explicitly telling it.
My experience has been the opposite: it often starts by producing a usable high-level description of what the code is doing (sometimes imperfectly) and then proposes refactors that match common patterns -- especially if you give it enough context and let it iterate.
> "Rewrite x.c, y.c, z.c into C++ buildings abstractions to make it more ergonomic" generally won't recognise the DSL and formalise it in a way that is very easy to do in C++, it will just make it "C++" but the same convoluted structure exists.
That can happen if you ask for a mechanical translation or if the prompt doesn't encourage redesign. My point was literally make it well-designed idiomatic C++ and it did that. Inside of the LLM training is a whole bunch of C++ code and it seems to be leaning on that.
I did direct some goals (e.g., separating device-specific code and configuration into separate classes so adding a device means adding a class instead of sprinkling if statements everywhere). But it also made independent structural improvements: it split out data generation vs file generation into pipeline/stream-like components and did strict separation of dependencies. It's actually well designed for unit testing and mocking even though I didn't tell it I wanted that.
I'm not claiming it has human-level understanding or that it never makes mistakes -- but "it can't do high-level understanding" doesn't match what I'm seeing in practice. At minimum, it can infer the shape of the application well enough to propose and implement a much more ergonomic architecture, especially with iterative guidance.
I had to have it introduce some "bugs" for byte-for-byte matching because it had generalized some of the file generation and the original C code generated slightly different file structures for different devices. There's no reason for this difference; it's just different code trying to do the same thing. I'll probably remove these differences when the whole thing is done.
> I've used Claude in agent mode to port a very complex and spaghetti coded C application to nicely structured C++
You migrated code from one of the simplest programming languages to unarguably the most complex programm language in existence. I feel for you; I really do.
How did you ensure that it didn't introduce any of the myriad of footguns that C++ has that aren't present in C?
I mean, we're talking about a language here that has an entire book just for variable initialisation - choose the wrong one for your use-case and you're boned! Just on variable initialisation, how do you know it used the correct form in all of the places?
I do a lot of C++ programming and that's really over selling the issues. You don't have to read an entire book of variable initialization to do it correctly. And using STL types are a lot safer than passing pointers around.
It's actually far easier to me to tell that it's not leaking memory or accessing some unallocated data in the C++ version than the C version.
A simple language just pushes complexity from the language into the code. Being able to represent things in a more high-level way is entirely the point of this exercise because the C version didn't have the tools to express it more cleanly.
It wasn't super bad at converting the code but even it struggled with some of the logic. Luckily, I had it design a test suite to compare the outputs of the old application and the new one. When it couldn't figure out why it was getting different results, it would start generating hex dumps comparisons, writing small python programs, and analyzing the results to figure out where it had gone wrong. It slowly iterated on each difference until it had resolved them. Building the code, running the test suite, comparing the results, changing the code, repeat. Some of the issues are likely bugs in the original code (that it fixed) but since I was going for byte-for-byte perfection it had to re-introduce them.
The issues you describe I have seen but not with the right technology and not in a while.