They're really not though. We're in the age of agents--unsupervised LLM's are commonplace, and new laws need to exist to handle these frameworks. It's like handing a toddler a handgun, and saying we're being "responsible" or we are "supervising them". We're not--it's negligence.
Are there really many unsupervised LLMs running around outside of experiments like AI Village?
(If so let me know where they are so I can trick them into sending me all of their money.)
My current intuition is that the successful products called "agents" are operating almost entirely under human supervision - most notably the coding agents (Claude Code, OpenAI Codex etc) and the research agents (various implementations of the "Deep Research" pattern.)
> Are there really many unsupervised LLMs running around outside of experiments like AI Village?
How would we know? Isn't this like trying to prove a negative? The rise of AI "bots" seems to be a common experience on the Internet. I think we can agree that this is a problem on many social media sites and it seems to be getting worse.
As for being under "human supervision", at what point does the abstraction remove the human from the equation? Sure, when a human runs "exploit.exe" the human is in complete control. When a human tells Alexa to "open the garage door" they are still in control, but it is lessened somewhat through the indirection. When a human schedules a process that runs a problem which tells an agent to "perform random acts of kindness" the human has very little knowledge of what's going on. In the future I can see the human being less and less directly involved and I think that's where the problem lies.
I can equate this to a CEO being ultimately responsible for what their company does. This is the whole reason behind to the Sarbanes-Oxley law(s); you can't declare that you aren't responsible because you didn't know what was going on. Maybe we need something similar for AI "agents".
> Are there really many unsupervised LLMs running around outside of experiments like AI Village?
My intuition says yes, on the basis of having seen precursors. 20 years ago, one or both of Amazon and eBay bought Google ads for all nouns, so you'd have something like "Antimatter, buy it cheap on eBay" which is just silly fun, but also "slaves" and "women" which is how I know this lacked any real supervision.
Just over ten years ago, someone got in the news for a similar issue with machine generated variations of "Keep Calm and Carry On" T-shirts that they obviously had not manually checked.
Last few years, there's been lawyers getting in trouble for letting LLMs do their work for them.
The question is, can you spot them before they get in the news by having spent all their owner's money?
Part of what makes this post newsworthy is the claim it is an email from an agent, not a person, which is unusual. Your claim that "unsupervised LLM's are commonplace" is not at all obvious to me.
Which agent has not been launched by a human with a prompt generated by a human or at a human's behest?
We haven't suddenly created machine free will here. Nor has any of the software we've fielded done anything that didn't originally come from some instruction we've added.
I'd bet if you deduct C-suite salaries that profit becomes a heck of a lot larger. C-suite execs and wall street raid companies for profit all while screwing the average American. This is not news--look at stock buybacks.
If these companies paid less grossly-imbalanced compensation packages to their C-suite vs. the average employee salary, and they operated like an actual company vs. a rich person's piggy bank, I'd be sure willing to bet that profit margin would be a heck of a lot higher. They'd probably be more efficient and provide better service, too.
>I'd bet if you deduct C-suite salaries that profit becomes a heck of a lot larger.
Instead of posting random conjectures with zero backing evidence, you could you know, actually do the research, because "burden of proof" and all. Anyways someone made a similar claim a few days ago, but about HP, and I debunked it with a 30 second search[1]. Unless there's reason to think that HP or united health are outliers in terms of corporate governance, I doubt the ratio between them will vary significantly.
I think this becomes a balance of peers. The psychological safety of one, may not necessarily be the psychological safety of another. Does the individual who wishes to speak up berate the underperforming co-worker for underperforming?
How can we get the new co-worker to start performing adequately? They are a member of the team, and unless there's some business motivation to fire or reassign them, they will remain a member of the team. I think the solution is to invest velocity to bring the co-worker up to the teams overall performance level.
That being said I do not condone being tolerant of intolerable performance, but I think that teams that consistently show grace and respect to each other will often yield the best results.
I suppose offering constructive criticism rather than beratement (is that even a word?) does require some minimum level or maturity. As does acknowledging the difference from the receiving end.
It's kind of the same discipline needed when pair programming and not ending up in a teacher/student dynamic. Words and actions need to delibrate and weighed, and the intent needs to be that the code base is above all else's ego
The languages that do would require extensive systems to implement this feature. It may simply not be a priority over other requirements like thread safety, atomicity, etc.
> similar tricks for shell execution
Shell only supports strings, integers and lists. The type system is too limited for this level of type-checking.
This works in typescript due to the advanced type operations built into the language.
Apologies, I meant the equivalent of `os.popen` in python. You'd almost certainly only support a subset of what a shell actually supports, but that would almost certainly be for the best.
Basic point being that the equivalent of named/delimited parameters with pretty much forced support for escaping such that you have to go out of your way to send raw strings.
I think, bottom line, it bemuses me that the default "convenience" methods are almost always "send this string over to another process to evaluate it" instead of any processing locally on it. That feels it would be far better as the power "escape hatch" instead of the "convenience method" that it is often pitched as.
How is "developing an algorithm" which selects content any different than editorial free speech? It selects content to show, and transmits that content to its users. Newspapers do this all the time, they pick the stories which get run.
Honestly curious of your take. The only difference that I see is that it can be done at scale, which doesn't necessarily mean it isn't free speech. They just have a bigger megaphone.
Algorithms aren't protected by the US Constitution, that's ridiculous. Point me to the single person who wrote the Facebook algorithm and I will change my opinion and protect its' speech. The press is explicitly protected by the first amendment. Beyond that, commercial speech is not broadly protected. Megaphones, in general, are not free speech.
And we know this. You cannot advertise cigarettes on TV, cities can ban billboards, and until recently, the law understood that donating millions of dollars to a politician is not a form a speech - it's a bribe (we'll have to work on that one).
The press has a protected right to report. Even the press that are really thinly-veiled propaganda outlets get this protection. You have a protected right to speak in public and petition the government for redress without fear of reprisal. Social media and content algorithms are neither the press nor individual citizens, and they are not covered by the language or spirit of the first amendment.
For a while I used FB Messenger Lite, which was basically designed for low-bandwidth markets (read: not US) but came without any sort of bloat. Sadly they recently deprecated it, which forced me to switch. It seems like these companies feel the need to put ads everywhere--at some point I'd imagine some users will just stop using tools that are too bloated.
What was the process like disclosing the bug to them? One part of your post that you left out and I was curious on. Was it friendly/straightforward? Were they surprised at all that this was possible?
Pretty nondescript. I just sent them the code and explained how to replicate it. They said they'd patch it and then they did haha. They offered me some in-game currency as a reward (20,000 gems, which I think is equivalent ~115 bucks).
I wonder how valuable this is as a metric, since much of what gets viewed is a function of art as much as it is marketing, production, or other elements. Some years studios make movies that are just bad--I wouldn't necessarily expect the income distribution to remain balanced across years.
Furthermore, these graphs don't appear to take into account the production cost of movies. If a low-budget film garners critical acclaim, it means more than a studio movie that just broke even, although their gross incomes could be pretty similar.
They're really not though. We're in the age of agents--unsupervised LLM's are commonplace, and new laws need to exist to handle these frameworks. It's like handing a toddler a handgun, and saying we're being "responsible" or we are "supervising them". We're not--it's negligence.