yeh- what is interesting is that it is way more viral and ... complicit than any of the doomer threads. If it does build a self-sustaining hivemind across whatsapp and xitter.. it will be entirely self inflicted by people enjoying the "Jackass" level/ lack of security
These operate in parallel. Maybe you SDLC does that, the effort of each human developer sitting in a planning meeting, getting jira tickets, doing individual code (or pair or whatever), reporting back in standup, coordinating the next step, getting it QA'ed...
Yes, if your shop is well developed these work (10% of the time every time), but this is a structure to kick that all in to gear, as a repo, where all you need to add is unlimited machine cognitive power/tokens.
Maybe you need to add these gas town personalities to various parts of the existing SDLC, .....but..... you still need to track what they do and how- and you need them to intermediate between each other at 2am when they hit an impasse. Something very rare in most human cognition shops.
And word from the experimenters is.. it sort of works. Which is on par with most human shops. IMO. I don't have the money to burn to test at the scale Yegge is, but the small scale stuff I have done in this direction, this seems plausible.
Are you saying that people can't work out what to code using these? Or that code is not a worthy subject to use AI for? 'cause I got news for you...
1. Improving coding improved reasoning in the models. Having a verifiable answer that is not a single thing is a good training test.
2. Software has been used for fairly serious things. We used to have skyscrapers of people doing manual math. Now we have campuses of people doing manual code. You might argue that nobody would trust AI to write code when it matters. History tells us that if that is ever true, it will pass.
3. We are not going to run out of planet. It just feels to folks that there is not enough planet for their dreams and we get population panic, energy panic etc. There is a huge fusion reactor conveniently holding us in it's gravity well and spewing out many orders of magnitude more energy than we can currently use. Chill.
I think at Gas Country levels we will need better networking systems. Maybe that backbone Nvidia just built....
Replacing human computers with electronic computers is nothing like what LLMs do or how they work. The electronic computer is straight up automation. Same input in gives you the same input out every time. Electronic computers are actually pretty simple. They just do simple mathematical operations like add, subtract, multiply, and divide. What makes them so powerful is that they can do billions of those simple operations a second.
LLMs are not simple deterministic machines that automate rote tasks like computers or compilers. People, please stop believing and repeating that they are the next level of abstraction and automation. They aren't.
try the tools. Really. If you are remotely interested in tech or AI, try the tools
Copilot this is not. You may be trolling of course. There are huge steps between these various tools, if you try them, for a smidge of investment, it will become obvious what the trajectory is.
It is like saying "I don't handwrite anything, I care too much about line spacing, I only use a dot matrix printer" when some one is trying to sell you a calligraphy pen and coloured inks, and you have only tried a ballpoint pen. You might be the wrong market, but they are not even close in use case and application.
I'm not trolling. I'm just not aware of major differences between them.
When I make a change with a Copilot Agent, it checks for issues, builds my project, runs tests, and iterates until things work. Multiple agents can do that in parallel.
My impression was that this does more or less the same thing.
That said, I'm definitely open to learning more about them both.
What are the advantages of this in your experience?
It is worth an install; it works very differently than an agent in a single loop.
Beads formalizes building a DAG for a given workload. This has a bunch of implications, but one is that you can specify larger workloads and the agents won’t get stuck or confused. At some level gas town is a bunch of scaffolding around the benefits of beads; an orchestrator that is native to dealing with beads opens up many more benefits than one that isn’t custom coded for it.
Think of a human needing to be interacted with as a ‘fault’ in an agentic coding system — a copilot agent might be at 0.5 9s or so - 50% of tasks can complete without intervention, given a certain set of tasks. All the gas town scaffolding is trying to increase the number of 9s, and the size of the task that can be given.
My take - Gas town (as an architecture) certainly has more nines in it than a single agent; the rest is just a lot of fun experimentation.
Yes he is on an extended manic episode right now - we can only sit back and enjoy the fruits of his extreme labor. I expect the dust will settle at some point, and I think he’s right that he’s on to some quality architecture.
In your post history you say you have never programmed. Why are you so sure it produces code of value?
This is so prohibitively expensive in its wastefulness that blithely telling strangers to try the tools likely means you either haven't tried it, or have money to burn.
so what you are saying is that for production we should use AI, and hand code for hobby, got it. Lemme log back into the vpn and set the agents on the Enterprise monorepo /jk
Yeh, I agree with this. My art (painting and building) comes at a much faster rate when I am content. Having time and metal space to contemplate colour scheme, being confident to start something bold: that doesn't happen if I am tired, preoccupied or depressed
The problem with laws that both the enforcer and the subject (enforcee?) agree are bad, is that enforcement is variable. And that leads to corruption. Every damn time.
The fix for corruption is vote the bums out of office. It is not to go whole hog into blind application of the law.
Think about how hard it is to write code that has no bugs. Now imagine you're using English and working with a system with so many parameters and side effects that you can't possibly anticipate all eventualities.
And now you want to rigidly apply your operators to this parameter space?
Selective enforcement is necessary for justice, because no law is perfectly just, and selective enforcement helps move toward justice.
It unfortunately also means there is the eventuality of corruption. So you just have to keep vigilant. Because a rigid system with no selective enforcement has no fix for injustice other than "live with it."
> The fix for corruption is vote the bums out of office.
That doesn’t seem to be working.
I argue there’s an acceptable level of corruption, only the particular flavours change from time to time.
Come out of government better off than when you when in. Fine, good on ya. No need to tells us about how you’re going about it while you’re going about it.
Learn to be at least a little bit discreet, and at least do something occasionally that comes across as good for the average person.
Human metrics of intelligence have always felt like rubbish. We never did this well. I would describe intelligence as effective adaption leading to survival and growth or prospering. Memorization, comprehension, speed of response etc. those are magnifying factors that are valued, we view them as components of intelligence, but llms are proving this is not the whole, without effective application, they are not intelligence. Perhaps learning is the difference? How to measure that?
Someone describing string theory is the literary equivalent of fractal structures in snowflakes. Lovely, complex, possibly unique, but not proof of a level of intelligence- for the string theorist maybe it is intelligent, perhaps persuading someone to fund their grant, which enables them to eat, shelter etc. Might be a bit harsh on string theory. Saying it is proof of an amount of intelligence leads us to falsifiable statements.
We have 20+ services in prod that use llms. So I have 50k (or more) per service per day of data to evaluate. The question is- do people actually evaluate properly.
And how do you do an apples to apples evaluation of such squishy services?
reply