Agreed 100%. For those of us who have already spent ungodly hours creating hyper...

CuriouslyC · 2025-09-07T15:04:14 1757257454

I have tripwires in my codebase for when Claude tries to run benchmarks with mock/synthetic data because it had a hard time getting the benchmark to run and decided to yeet it, to avoid potential scientific credibility issues, LOL. You can put the system on rails, but it's an engineering problem, these things are noisy program emitters with some P(correct|context), you can model them as noisy channels and use the same error correcting codes to create channels with arbitrarily low noise.