Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I tried to find some code that wasn't minified to assess the quality of this, and I found some shader code for the sky in the gemini version. The whole shader looks like it was regurgitated verbatim. This wouldn't hold up to licensing scrutiny. Here's a snippet from it:

  // wavelength of used primaries, according to preetham
  const vec3 lambda = vec3( 680E-9, 550E-9, 450E-9 );
  // this pre-calcuation replaces older TotalRayleigh(vec3 lambda) function:
  // (8.0 * pow(pi, 3.0) * pow(pow(n, 2.0) - 1.0, 2.0) * (6.0 + 3.0 * pn)) / (3.0 * N * pow(lambda, vec3(4.0)) * (6.0 - 7.0 * pn))
Who's Preetham? Probably one of the copyright holders on this code.




Preetham is the author of the paper that defines this algorithm from 1999:

  https://tommyhinks.com/2009/02/10/preetham-sky-model/

  https://tommyhinks.com/wp-content/uploads/2012/02/1999_a_practical_analytic_model_for_daylight.pdf
Rather than stolen from Mr. Preetham, it's much more likely this fragment is generated from a large number of Preetham algorithm implementations out there, eg. I know at least Blender and Unreal implement it and probably heaps of others was well.

Nobody is going to sue you for using their implementation of a skybox algorithm from 1999, give us break. It's so generic you can probably really only write it in a couple of different ways.

If youre worried about it you can always spend a day with Claude, ChatGPT and yourself looking for license infringements and clean up your code.


> Nobody is going to sue you for using their implementation of a skybox algorithm from 1999, give us break.

For personal use maybe not, but that's not the point, the point is it's spitting out licensed code and not even letting you know. Now if you're a business who hire exclusively "vibe" coders with zero experience with enterprise software, now you're on the hook and most likely will be sued.


Do you have any evidence that it is spitting out licensed code? Did you locate an original that it was copied from?

This seems like it could be the source: https://github.com/GPUOpen-LibrariesAndSDKs/Cauldron/blob/ma...

If true, then this usage could violate its MIT License: "The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software."

The file seems to have been copied verbatim, more or less. But without the copyright info


This particular case appears to me to be a straight derivative at best but I'm by no means an expert on copyright laws.

That's not to say there hasn't already been more direct cases with set examples [1], from an author directly who would have a better right to claim than I [2], it's not even a stretch to see how it can happen.

[1] https://arxiv.org/html/2408.02487v3

[2] https://x.com/DocSparse/status/1581461734665367554


As discussed repeatedly in this thread already in this particular case the code at hand wasn't generated by an LLM at all, it was simply included from a dependency by the build system.

> implementation of a skybox algorithm from 1999

How would you know? Do you have another AI scan for copyright violations? In terms of a false negative how are disputes resolved?

Seems like a massive attack surface for copyright trolls.


> Seems like a massive attack surface for copyright trolls.

If you think any court system in the world has the capacity to deal with the sheer amount an LLM code can emit in an hour and audit for alleged copyright infringements ... I think we're trying to close the barn door now that the horse is already on a ship that has sailed.


This is a terrible argument, just because of the way the legal system works.

If MegaCorp has massive $$$$, but everyone else has small $, then MegaCorp can sue anyone else for using "their" code, that was supposedly generated by an LLM. Most of the time, it won't even get to court. The repo, the program, the whatever-they-want will get taken down way before that.

Courts don't work by saying, "oh, but everyone is doing it! Not much we can do now."

Someone brings a case and they, very laboriously, start to address it on its merits. Even before that, costs are accumulating on both sides.

Copyright trolls are mostly not MegaCorps, but they are abusers of the legal system. They won't target Google, but you, with your repo that does something that minorly annoys them? You are fair game.


> Courts don't work by saying, "oh, but everyone is doing it! Not much we can do now."

No, but they do recognise when their case registrations are filling up in a way that they cannot possibly process and make adjustments. Courts do not have an infinite capacity.

There's a really simple solution that you may not have considered:

1) don't put your vibe-coded source code in a public git repo, keep it in a local one, with y'know, authentication in front of it;

2) regularly ask your agents to review the code for potential copyright infringements if you either want to release the source or compiled code to the public at any point.

As long as you've followed best practices, I can't see why this is really going to become an issue. Most copyright infringements need to start with Cease & Desist anyway or they'll be thrown out of court. The alleged offender has to be given the opportunity to make good.

So "Claude, we received a C&D for this section of code you stole from https://.../ , you need to make a unique implementation that does not breach their copyright".

You will be surprised how easily this can be resolved.


Just to push the point further I was making about the courts, I came upon this article a couple hours later: https://www.bbc.com/news/articles/cn5lxg2l0lqo

In the US you can't sue without having obtained or applied for a registration. If the registration does not grant, you cannot sue. You cannot get a registration for code developed by AI.

> Courts don't work by saying, "oh, but everyone is doing it! Not much we can do now."

They kind of do. If you fail to bring legal action to guard your intellectual property, and there’s a pattern of you not guarding it, then in future cases this can be used against you when determining damages etc. Weakens your case.

Downvoting won’t make it untrue lol.


This is only true of trademarks, not copyrights (which was the discussion here).

Trademarks can become 'generic' if you don't defend them. But JK Rowling wrote Harry Potter, whether she sues fanfic authors or not, and can selectively enforce her copyright as she likes.


It's taken from a threejs example: https://github.com/mrdoob/three.js/blob/dev/examples/jsm/obj...

Seems fine given the project is already using threejs and so will have to include license info for it already.



If you're curious about the source, here's the snapshot:

Codex: https://github.com/stopachka/cscodex Gemini: https://github.com/stopachka/csgemini Claude: https://github.com/stopachka/csclaude


Thanks. Turns out that shader is a builtin of three.js.

Please try again with Codex on High or Extra High. 5.1-Max nerfed it a bit if you don't use higher thinking.

This is overparameterisation


I guess you have not tried GPT 5 Pro

GPT’s differentiator is they focused on training for “thinking” while Gemini prioritized instant response. Medium thinking is not the limit of utility

Re: overparameterization specifically Medium and High are also identically parameterized

Medium will also dynamically use even higher thinking than High. High is fixed at a higher level rather than leaving it to be dynamic, though somewhat less than Medium’s upper limit


I also noticed that AI agents commit many copyright infringements with the work of Mr Dijkstra.

Messrs Newton and Raphson would like to join this class-action.

That's a stellar observation right there.

To be clear, this is a pun on A*

The idea that someone could hold copyright over such a tiny snippet of code is just as stupid as LLMs regurgitating them.

Personally i find it absurd that code can be copyrighted at all.

Copyright is so-so. At the end of the day you can say that the complete work (not just snippets) is something copyrightable. But the most bananas thing for me is that one can patent the concept of one click purchasing. That's insane on many levels.

Why bananas? That is the biggest invention after edisons bulb.

A lot of computer graphics algorithms are named after their authors

If only this particular regurgitation engine took a minute to check their work.

I always find it amazing that people are wiling to use AI beacuse of stuff like this, its been illegally trained on code that it does not have the license to use, and constantly willy nilly regurgitates entire snippets completely violating the terms of use

Edit:

https://github.com/vorg/pragmatic-pbr/blob/master/local_modu...

https://github.com/vorg/pragmatic-pbr/blob/master/local_modu...

This looks like where the source code was stolen from: this repository is unlicensed, and this is copyright infringement as a result


As discussed in this thread before you posted this comment, this code wasn't generated from an LLM at all, but simply included in a dependency: https://news.ycombinator.com/item?id=46092904

Unlike your results which aren't exact match, or likely even a close enough match to be copyright infringment if the LLM was inspired by them (consider that copyright doesn't protect functional elements), an exact match of the code is here (and I assume from the comment I linked above this is a dependency of three.js, though I didn't track that down myself): https://github.com/GPUOpen-LibrariesAndSDKs/Cauldron/blob/b9...

Edit: Actually on further thought the date on the copyright header vs the git dates suggests the file in that repo was copied from somewhere else... anyways I think we can be reasonably confident that a version of this file is in the dependency. Again I didn't look at the three.js code myself to track down how its included.

If there's any copyright infringment here it would be because bog standard web tools fail to comply with the licenses of their dependencies and include a copy of the license, not because of LLMs. I think that is actually the case for many of them? I didn't investigate the to check if licenses are included in the network traffic.


There are several cases where copyright law is not only about exact copy but also derivatives. So finding an exact match is not necessary.. Not sure it matters in this case eitherway.

conradev found likely the right version of the file here: https://github.com/mrdoob/three.js/blob/55b4bbb7ef7e29b214b9...

I have been trained on code I don't have the license to use myself. I'm not like these Creators, who suck wisdom from the cosmos directly, apparently.

Sure. It's a problem that corporations run by more or less insane people are the ones monetizing and controlling access to these tools. But the solution to that can't be even more extended private monopolistic property claims to thought-stuff. Such claims are usually the way those crazy people got where they are.

You think in a world where Elsevier didn't just own the papers, but rights to a share in everything learned from them, would be better for you?


It's fascinating that people care very much about this when it's visual arts, but when it comes to code almost no one does.

E.g. the latest Anno game (117) received a lot of hate for using AI generated loading screen backgrounds, while I have never heard of a single person caring about code, which probably was heavily AI generated.


I believe it is MIT-licensed code from three.js: https://github.com/mrdoob/three.js/blob/55b4bbb7ef7e29b214b9...

You presume that people care about things like this. A lot of people don't.

Companies should. Its a business risk, you open yourself up to legal action

"Claude - rewrite this apparently copyrighted code that can be found online here <http://...> in a way that makes it a unique implementation." <- probably will work.

Is it copyright infringement? It's a fundamental algorithm.

The courts have ruled that generated output is not infringing.

If I say, “output the contents of X verbatim” and then use the output, am I free from liability?

If the generated code in TFA contained the actual Counter-Strike source code, then you (well, Valve) would have a defensible claim. But the prompt was to make something like Counter-Strike, and it came up with something different. That's fair game.

I can assure you that Valve is not remotely concerned about this AI generated "first person shooter" taking market share away from them.

Definitely citation needed. Such court cases usually come with a lot of important context. How can you just make such a statement and get away with not providing any context link?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: