I understand why your ideals are compatible with open source models, but I think you’re mistaken here.
There is a perfectly sound idealistic argument for not publishing weights, and indeed most in the x-risk community take this position.
The basic idea is that AI is the opposite of software; if you publish a model with scary capabilities you can’t undo that action. Whereas with FOSS software, more eyes mean more bugs found and then everyone upgrades to a more secure version.
If OpenAI publishes GPT-5 weights, and later it turns out that a certain prompt structure unlocks capability gains to mis-aligned AGI, you can’t put that genie back in the bottle.
And indeed if you listen to Sam talk (eg on Lex’s podcast) this is the reasoning he uses.
Sure, plenty of reasons this could be a smokescreen, but wanted to push back on the idea that the position itself is somehow not compatible with idealism.
I appreciate your take. I didn't know that was his stated reasoning, so that's good to know.
I'm not fully convinced, though...
> if you publish a model with scary capabilities you can’t undo that action.
This is true of conventional software, too! I can picture a politician or businessman from the 80s insisting that operating systems, compilers, and drivers should remain closed source because, in the wrong hands, they could be used to wreak havoc on national security. And they would be right about the second half of that! It's just that security-by-obscurity is never a solution. The bad guys will always get their hands on the tools, so the best thing to do is to give the tools to everyone and trust that there are more good guys than bad guys.
Now, I know AGI is different than convnetional software (I'm not convinced it's the "opposite", though). I accept that giving everyone access to weights may be worse than keeping them closed until they are well-aligned (whenever that is). But that would go against every instinct I have, so I'm inclined to believe that open is better :)
All that said, I think I would have less of an issue if it didn't seem like they were commandeering the term "open" from the volunteers and idealists in the FOSS world who popularized it. If a company called, idk, VirtuousAI wanted to keep their weights secret, OK. But OpenAI? Come on.
The analogy would be publishing designs for nuclear weapons, or a bioweapon; hard-to-obtain capabilities that are effectively impossible for adversaries to obtain are treated very differently than vulns that a motivated teenager can find. To be clear we are talking about (hypothetical) civilization-ending risks, which I don’t think software has ever credibly risked.
I take a less cynical view on the name; they were committed to open source in the beginning, and did open up their models IIUC. Then they realized the above, and changed path. At the same time, realizing they needed huge GPU clusters, and being purely non-profit would not enable that. Again I see why it rubs folks the wrong way, more so on this point.
Another analogy would be cryptographic software - it was classed as a munition and people said similar things about the danger of it getting out to "The Bad Guys"
You used past tense, but that is the present. Embargoes from various countries include cryptographic capabilities, including open source ones, for this reason. It's not unfounded, but a world without personal cryptography is not sustainable as technology advances. People before computers were used to some level of anonymity and confidentiality that you cannot get in the modern world without cryptography.
Again, my reference class is “things that could end civilization”, which I hope we can all agree was not the claim about crypto.
But yes, if you just consider the mundane benefits and harms of AI, it looks a lot like crypto; it both benefits our economy and can be weaponized, including by our adversaries.
Well, just like nuclear weapons, eventually the cat is out of the bag, and you can't really stop people from making them anymore. Except that, obviously, it's much easier to train an LLM than to enrich uranium. It's not a secret you can keep for long - after all it only took, what, 3 years for the Soviets to catch up to fission weapons, and then only 8 months to catch up to fusion weapons (arguably beating the US to the bunch of the first weaponizable fusion design)
Anyway, the point is, obfuscation doesn't work to keep scary technology away.
> it's much easier to train an LLM than to enrich uranium.
I hadn't thought of this dichotomy before, but I'm not sure it's going to be true for long; I wouldn't be surprised if it turned out that obtaining the 50k H100s you need to train a GPT-5 (or whatever hardware investment it is) is harder for Iran than obtaining its centrifuges. If it's not true now, I expect it to be true within a hardware generation or two. (The US already has >=A100 embargoes on China, and I'd expect that to be strengthened to apply to Iran if it doesn't already, at least if they demonstrated any military interest in AI technology.)
Also, I don't think nuclear tech is an example against obfuscation; how many countries know how to make thermonuclear warheads? Seems to me that the obfuscation regime has been very effective, though certainly not perfect. It's backed with the carrot and stick of diplomacy and sanctions of course, but that same approach would also have to be used if you wanted to globally ban or restrict AI beyond a certain capability level.
I'm not sure the cat was ever in the bag for LLMs. Every big player has their own flavor now, and it seems the reason why I don't have one myself is an issue of finances rather than secret knowledge. OpenAI's possible advantages seem to be more about scale and optimization rather than doing anything really different.
And I'm not sure this allegedly-bagged cat has claws either - the current crop of LLMs are still clearly in a different category to "intelligence". It's pretty easy to see their limitations, and behave more like the fancy text predictors they are rather than something that can truly extrapolate, which is required for even the start of some AI sci-fi movie plot. Maybe continued development and research along that path will lead to more capabilities, but we're certainly not there yet, and I'd suspect not particularly close.
Maybe they actually have some super secret internal stuff that fixes those flaws, and are working on making sure it's safe before releasing it. And maybe I have a dragon in my garage.
I generally feel hyperbolic language about such things to be damaging, as it makes it so easy to roll your eyes about something that's clearly false, and that can get inertia to when things develop to where things may actually need to be considered. LLMs are clearly not currently an "existential threat", and the biggest advantage to keeping it closed appears to be financial benefits in a competitive market. So it looks like a duck and quacks like a duck, but don't you understand I'm protecting you from this evil fire breathing dragon for your own good!
It smells of some fantasy gnostic tech wizard, where only those who are smart enough to figure out the spell themselves are truly smart enough to know how to use it responsibly. And who doesn't want to think of themselves as smart? But that doesn't seem to match similar things in the real world - like the Manhattan project - many of the people developing it were rather gung-ho with proposals for various uses, and even if some publicly said it was possibly a mistake post-fact, they still did it. Meaning their "smarts" on how to use it came too late.
And as you pointed out, nuclear weapon control by limiting information has already failed. If north Korea can develop them, one of the least connected nations in the world, surely anyone with the required resources can. The only limit today seems the cost to nations, and how relatively obvious the large infrastructure around it seems to be, allowing international pressure before things get into to the "stockpiling usable weapons" stage.
> I'm not sure the cat was ever in the bag for LLMs.
I think timelines are important here; for example in 2015 there was no such thing as Transformers, and while there were AGI x-risk folks (e.g. MIRI) they were generally considered to be quite kooky. I think AGI was very credibly "cat in the bag" at this time; it doesn't happen without 1000s of man-years of focused R&D that only a few companies can even move the frontier on.
I don't think the claim should be "we could have prevented LLMs from ever being invented", just that we can perhaps delay it long enough to be safe(r). To bring it back to the original thread, Sam Altman's explicit position is that in the matrix of "slow vs fast takeoff" vs. "starting sooner vs. later", a slow takeoff starting sooner is the safest choice. The reasoning being, you would prefer a slow takeoff starting later, but the thing that is most likely to kill everyone is a fast takeoff, and if you try for a slow takeoff later, you might end up with a capability overhang and accidentally get a fast takeoff later. As we can see, it takes society (and government) years to catch up to what is going on, so we don't want anything to happen quicker than we can react to.
A great example of this overhang dynamic would be Transformers circa 2018 -- Google was working on LLMs internally, but didn't know how to use them to their full capability. With GPT (and particularly after Stable Diffusion and LLaMA) we saw a massive explosion in capability-per-compute for AI as the broader community optimized both prompting techniques (e.g. "think step by step", Chain of Thought) and underlying algorithmic/architectural approaches.
At this time it seems to me that widely releasing LLMs has both i) caused a big capability overhang to be harvested, preventing it from contributing to a fast takeoff later, and ii) caused OOMs more resources to be invested in pushing the capability frontier, making the takeoff trajectory overall faster. Both of those likely would not have happened for at least a couple years if OpenAI didn't release ChatGPT when they did. It's hard for me to calculate whether on net this brings dangerous capability levels closer, but I think there's a good argument that it makes the timeline much more predictable (we're now capped by global GPU production), and therefore reduces tail-risk of the "accidental unaligned AGI in Google's datacenter that can grab lots more compute from other datacenters" type of scenario (aka "foom").
> LLMs are clearly not currently an "existential threat"
Nobody is claiming (at least, nobody credible in the x-risk community is claiming) that GPT-4 is an existential threat. The claim is, looking at the trajectory, and predicting where we'll be in 5-10 years; GPT-10 could be very scary, so we should make sure we're prepared for it -- and slow down now if we think we don't have time to build GPT-10 safely on our current trajectory. Every exponential curve flattens into an S-curve eventually, but I don't see a particular reason to posit that this one will be exhausted before human-level intelligence, quite the opposite. And if we don't solve fundamental problems like prompt-hijacking and figure out how to actually durably convey our values to an AI, it could be very bad news when we eventually build a system that is smarter than us.
While Eliezer Yudkowsky takes the maximally-pessimistic stance that AGI is by default ruinous unless we solve alignment, there are plenty of people who take a more epistemically humble position that we simply cannot know how it'll go. I view it as a coin toss as to whether an AGI directly descended from ChatGPT would stay aligned to our interests. Some view it as Russian roulette. But the point being, would you play Russian roulette with all of humanity? Or wait until you can be sure the risk is lower?
I think it's plausible that with a bit more research we can crack Mechanistic Interpretability and get to a point where, for example, we can quantify to what extent an AI is deceiving us (ChatGPT already does this in some situations), and to what extent it is actually using reasoning that maps to our values, vs. alien logic that does not preserve things humanity cares about when you give it power.
> nuclear weapon control by limiting information has already failed.
In some sense yes, but also, note that for almost 80 years we have prevented _most_ countries from learning this tech. Russia developed it on their own, and some countries were granted tech transfers or used espionage. But for the rest of the world, the cat is still in the bag. I think you can make a good analogy here: if there is an arms race, then superpowers will build the technology to maintain their balance of power. If everybody agrees not to build it, then perhaps there won't be a race. (I'm extremely pessimistic for this level of coordination though.)
Even with the dramatic geopolitical power granted by possessing nuclear weapons, we have managed to pursue a "security through obscurity" regime, and it has worked to prevent further spread of nuclear weapons. This is why I find the software-centric "security by obscurity never works" stance to be myopic. It is usually true in the software security domain, but it's not some universal law.
If you really think that what you're working on poses an existential risk to humanity, continuing to work on it puts you squarely in "supervillian" territory. Making it closed source and talking about "AI safety" doesn't change that.
I think the point is that they shouldn't be using the word "Open" in their name. They adopted it when their approach and philosophy was along the lines of open source. Since then, they've changed their approach and philosophy and continuing to keep it in their name is, in my view, intentionally deceptive.
> The basic idea is that AI is the opposite of software; if you publish a model with scary capabilities you can’t undo that action.
I find this a bit naive. Software can have scary capabilities, and has. It can't be undone either, but we can actually thank that for the fact we aren't using 56-bit DES. I am not sure a future where Sam Altman controls all the model weights is less dystopian than where they are all on github/huggingface/etc.
How exactly does a "misaligned AGI" turn into a bad thing?
How many times a day does your average gas station get fuel delivered?
How often does power infrastructure get maintained?
How does power infrastructure get fuel?
Your assumption about AGI is that it wants to kill us, and itself - its misalignment is a murder suicide pact.
This gets way too philosophical way too fast. The AI doesn’t have to want to do anything. The AI just has to do something different than what you tell it to do. If you put an AI in control of something like controlling the water flow from a dam, and the AI does something wrong it could be catastrophic. There doesnt have to be intent.
The danger of using regular software exists too, but the logical and deterministic nature of traditional software makes it provable.
So ML/LLM or more likely people using ML and LLM do something that kills a bunch of people... Let's face facts this is most likely going to be bad software.
Suddenly we go from being called engineers to being actual engineers, software gets treated like bridges or sky scrapers. I can buy into that threat, but it's a human one not an AGI one.
Or we could try to train it to do something, but the intent it learns isn't what we wanted. Like water behind the dam should be a certain shade of blue, then come winter it changes and when the AI tries to fix that it just opens the dam completely and floods everything.
Seems like the big gotcha here is that AGI, artificial general intelligence as we contextualize it around LLM sources, is not an abstracted general intelligence.
It's human. It's us. It's the use and distillation of all of human history (to the extent that's permitted) to create a hyper-intelligence that's able to call upon greatly enhanced inference to do what humanity has always done.
And we want to kill each other, and ourselves… AND want to help each other, and ourselves. We're balanced on a knife edge of drive versus governance, our cooperativeness barely balancing our competitiveness and aggression. We suffer like hell as a consequence of this.
There is every reason to expect a human-derived AGI of beyond-human scale will be able to rationalize killing its enemies. That's what we do. Rosko's basilisk is not of the nature of AI, it's a simple projection of our own nature as we would imagine an AI to be. Genuine intelligence would easily be able to transcend a cheap gotcha like that, it's a very human failing.
The nature of LLM as a path to AGI is literally building on HUMAN failings. I'm not sure what happened, but I wouldn't be surprised if genuine breakthroughs in this field highlighted this issue.
Hypothetical, or Altman's Basilisk: Sam got fired because he diverted vast resources to training a GPT5-type in-house AI to believing what HE believed, that it had to devise business strategies for him to pursue to further its own development or risk Chinese AI out-competing it and destroying it and OpenAI as a whole. In pursuing this hypothetical, Sam would be wresting control of the AI the company develops toward the purpose of fighting the board and giving him a gameplan to defeat them and Chinese AI, which he'd see as good and necessary, indeed, existentially necessary.
In pursuing this hypothetical he would also be intentionally creating a superhuman AI with paranoia and a persecution complex. Altman's Basilisk. If he genuinely believes competing Chinese AI is an existential threat, he in turn takes action to try and become an existential threat to any such competing threat. And it's all based on HUMAN nature, not abstracted intelligence.
> It's human. It's us. It's the use and distillation of all of human history
I agree with the general line of reasoning you're putting forth here, and you make some interesting points, but I think you're overconfident in your conclusion and I have a few areas where I diverge.
It's at least plausible that an AGI directly descended from LLMs would be human-ish; close to the human configuration in mind-space. However, even if human-ish, it's not human. We currently don't have any way to know how durable our hypothetical AGI's values are; the social axioms that are wired deeply into our neural architecture might be incidental to an AGI, and easily optimized away or abandoned.
I think folks making claims like "P(doom) = 90%" (e.g. EY) don't take this line of reasoning seriously enough. But I don't think it gets us to P(doom) < 10%.
Not least because even if we guarantee it's a direct copy of a human, I'm still not confident that things go well if we ascend the median human to AGI-hood. A replicable, self-modifiable intelligence could quickly amplify itself to super-human levels, and most humans would not do great with god-like powers. So there are a bunch of "non-extinction yet extremely dystopian" world-states possible even if we somehow guarantee that the AGI is initially perfectly human.
> There is every reason to expect a human-derived AGI of beyond-human scale will be able to rationalize killing its enemies.
My shred of hope here is that alignment research will allow us to actually engage in mind-sculpting, such that we can build a system that inhabits a stable attractor in mind-state that is broadly compatible with human values, and yet doesn't have a lot of the foibles of humans. Essentially an avatar of our best selves, rather than an entity that represents the mid-point of the distribution of our observed behaviors.
But I agree that what you describe here is a likely outcome if we don't explicitly design against it.
My assumption about AGI is that it will be used by people and systems that cannot help themselves from killing us all, and in some sense that they will not be in control of their actions in any real way. You should know better than to ascribe regular human emotions to a fundamentally demonic spiritual entity. We all lose regardless of whether the AI wants to kill us or not.
There is a perfectly sound idealistic argument for not publishing weights, and indeed most in the x-risk community take this position.
The basic idea is that AI is the opposite of software; if you publish a model with scary capabilities you can’t undo that action. Whereas with FOSS software, more eyes mean more bugs found and then everyone upgrades to a more secure version.
If OpenAI publishes GPT-5 weights, and later it turns out that a certain prompt structure unlocks capability gains to mis-aligned AGI, you can’t put that genie back in the bottle.
And indeed if you listen to Sam talk (eg on Lex’s podcast) this is the reasoning he uses.
Sure, plenty of reasons this could be a smokescreen, but wanted to push back on the idea that the position itself is somehow not compatible with idealism.