I tried to find some code that wasn't minified to assess the quality of this, and I found some shader code for the sky in the gemini version. The whole shader looks like it was regurgitated verbatim. This wouldn't hold up to licensing scrutiny. Here's a snippet from it:
// wavelength of used primaries, according to preetham
const vec3 lambda = vec3( 680E-9, 550E-9, 450E-9 );
// this pre-calcuation replaces older TotalRayleigh(vec3 lambda) function:
// (8.0 * pow(pi, 3.0) * pow(pow(n, 2.0) - 1.0, 2.0) * (6.0 + 3.0 * pn)) / (3.0 * N * pow(lambda, vec3(4.0)) * (6.0 - 7.0 * pn))
Who's Preetham? Probably one of the copyright holders on this code.
Rather than stolen from Mr. Preetham, it's much more likely this fragment is generated from a large number of Preetham algorithm implementations out there, eg. I know at least Blender and Unreal implement it and probably heaps of others was well.
Nobody is going to sue you for using their implementation of a skybox algorithm from 1999, give us break. It's so generic you can probably really only write it in a couple of different ways.
If youre worried about it you can always spend a day with Claude, ChatGPT and yourself looking for license infringements and clean up your code.
> Nobody is going to sue you for using their implementation of a skybox algorithm from 1999, give us break.
For personal use maybe not, but that's not the point, the point is it's spitting out licensed code and not even letting you know. Now if you're a business who hire exclusively "vibe" coders with zero experience with enterprise software, now you're on the hook and most likely will be sued.
If true, then this usage could violate its MIT License: "The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software."
The file seems to have been copied verbatim, more or less. But without the copyright info
This particular case appears to me to be a straight derivative at best but I'm by no means an expert on copyright laws.
That's not to say there hasn't already been more direct cases with set examples [1], from an author directly who would have a better right to claim than I [2], it's not even a stretch to see how it can happen.
As discussed repeatedly in this thread already in this particular case the code at hand wasn't generated by an LLM at all, it was simply included from a dependency by the build system.
> Seems like a massive attack surface for copyright trolls.
If you think any court system in the world has the capacity to deal with the sheer amount an LLM code can emit in an hour and audit for alleged copyright infringements ... I think we're trying to close the barn door now that the horse is already on a ship that has sailed.
This is a terrible argument, just because of the way the legal system works.
If MegaCorp has massive $$$$, but everyone else has small $, then MegaCorp can sue anyone else for using "their" code, that was supposedly generated by an LLM. Most of the time, it won't even get to court. The repo, the program, the whatever-they-want will get taken down way before that.
Courts don't work by saying, "oh, but everyone is doing it! Not much we can do now."
Someone brings a case and they, very laboriously, start to address it on its merits. Even before that, costs are accumulating on both sides.
Copyright trolls are mostly not MegaCorps, but they are abusers of the legal system. They won't target Google, but you, with your repo that does something that minorly annoys them? You are fair game.
> Courts don't work by saying, "oh, but everyone is doing it! Not much we can do now."
No, but they do recognise when their case registrations are filling up in a way that they cannot possibly process and make adjustments. Courts do not have an infinite capacity.
There's a really simple solution that you may not have considered:
1) don't put your vibe-coded source code in a public git repo, keep it in a local one, with y'know, authentication in front of it;
2) regularly ask your agents to review the code for potential copyright infringements if you either want to release the source or compiled code to the public at any point.
As long as you've followed best practices, I can't see why this is really going to become an issue. Most copyright infringements need to start with Cease & Desist anyway or they'll be thrown out of court. The alleged offender has to be given the opportunity to make good.
So "Claude, we received a C&D for this section of code you stole from https://.../ , you need to make a unique implementation that does not breach their copyright".
You will be surprised how easily this can be resolved.
In the US you can't sue without having obtained or applied for a registration. If the registration does not grant, you cannot sue. You cannot get a registration for code developed by AI.
> Courts don't work by saying, "oh, but everyone is doing it! Not much we can do now."
They kind of do. If you fail to bring legal action to guard your intellectual property, and there’s a pattern of you not guarding it, then in future cases this can be used against you when determining damages etc. Weakens your case.
This is only true of trademarks, not copyrights (which was the discussion here).
Trademarks can become 'generic' if you don't defend them. But JK Rowling wrote Harry Potter, whether she sues fanfic authors or not, and can selectively enforce her copyright as she likes.
GPT’s differentiator is they focused on training for “thinking” while Gemini prioritized instant response. Medium thinking is not the limit of utility
Re: overparameterization specifically Medium and High are also identically parameterized
Medium will also dynamically use even higher thinking than High. High is fixed at a higher level rather than leaving it to be dynamic, though somewhat less than Medium’s upper limit
Copyright is so-so. At the end of the day you can say that the complete work (not just snippets) is something copyrightable. But the most bananas thing for me is that one can patent the concept of one click purchasing. That's insane on many levels.
I always find it amazing that people are wiling to use AI beacuse of stuff like this, its been illegally trained on code that it does not have the license to use, and constantly willy nilly regurgitates entire snippets completely violating the terms of use
As discussed in this thread before you posted this comment, this code wasn't generated from an LLM at all, but simply included in a dependency: https://news.ycombinator.com/item?id=46092904
Unlike your results which aren't exact match, or likely even a close enough match to be copyright infringment if the LLM was inspired by them (consider that copyright doesn't protect functional elements), an exact match of the code is here (and I assume from the comment I linked above this is a dependency of three.js, though I didn't track that down myself): https://github.com/GPUOpen-LibrariesAndSDKs/Cauldron/blob/b9...
Edit: Actually on further thought the date on the copyright header vs the git dates suggests the file in that repo was copied from somewhere else... anyways I think we can be reasonably confident that a version of this file is in the dependency. Again I didn't look at the three.js code myself to track down how its included.
If there's any copyright infringment here it would be because bog standard web tools fail to comply with the licenses of their dependencies and include a copy of the license, not because of LLMs. I think that is actually the case for many of them? I didn't investigate the to check if licenses are included in the network traffic.
There are several cases where copyright law is not only about exact copy but also derivatives. So finding an exact match is not necessary.. Not sure it matters in this case eitherway.
I have been trained on code I don't have the license to use myself. I'm not like these Creators, who suck wisdom from the cosmos directly, apparently.
Sure. It's a problem that corporations run by more or less insane people are the ones monetizing and controlling access to these tools. But the solution to that can't be even more extended private monopolistic property claims to thought-stuff. Such claims are usually the way those crazy people got where they are.
You think in a world where Elsevier didn't just own the papers, but rights to a share in everything learned from them, would be better for you?
It's fascinating that people care very much about this when it's visual arts, but when it comes to code almost no one does.
E.g. the latest Anno game (117) received a lot of hate for using AI generated loading screen backgrounds, while I have never heard of a single person caring about code, which probably was heavily AI generated.
"Claude - rewrite this apparently copyrighted code that can be found online here <http://...> in a way that makes it a unique implementation." <- probably will work.
If the generated code in TFA contained the actual Counter-Strike source code, then you (well, Valve) would have a defensible claim. But the prompt was to make something like Counter-Strike, and it came up with something different. That's fair game.
Definitely citation needed. Such court cases usually come with a lot of important context. How can you just make such a statement and get away with not providing any context link?