People want predictability from LLMs, but these things are inherently stochastic...

People want predictability from LLMs, but these things are inherently stochastic, not deterministic compilers. What’s working right now isn’t "prompting better," it’s building systems that keep the LLM on track over time: logging, retrying, verifying outputs, giving it context windows that evolve with the repo, etc.

That’s why we’ve been investing so much in multi-agent supervision and reproducibility loops at gobii.ai. You can’t just "trust" the model; you need an environment where it’s continuously evaluated, self-corrects, and coordinates with other agents (and humans) around shared state. Once you do that, it stops feeling like RNG and starts looking like an actual engineering workflow, distributed between humans and LLMs.