> Courts (at least in the US) have already ruled that use of ingested data for training is transformative.
If you have code that happens to be identical to some else's code or implements someone's proprietary algorithm, you're going to lose in court even if you claim an "AI" gave it to you.
AI is training on private Github repos and coughing them up. I've had it regurgitate a very well written piece of code to do a particular computational geometry algorithm. It presented perfect, idiomatic Python with perfect tests that caught all the degenerate cases. That was obviously proprietary code--no amount of searching came up with anything even remotely close (it's why I asked the AI, after all).
>If you have code that happens to be identical to some else's code or implements someone's proprietary algorithm, you're going to lose in court even if you claim an "AI" gave it to you.
Not for a dozen lines here or there, even if it could be found and identified in a massive code base. That’s like quoting a paragraph of a book in another book, non infringing.
For the second half of your comment it sounds like you’re saying you got results that were too good to be AI- that’s a bit “no true Scotsman”, at least without more detail. But implementing an algorithm, even a complex one, is very much something an LLM can do. Algorithms are much better defined and scoped natural language, and LLMs do a reasonable job of translating to languages. An algorithm is a narrow subset of that task type with better defined context and syntax.
> Not for a dozen lines here or there, even if it could be found and identified in a massive code base. That’s like quoting a paragraph of a book in another book, non infringing.
It's potentially non-infringing in a book if you quote it in a plausible way, and properly.
If you copy&paste a paragraph from another book into yours, it's infringing, and a career-ending scandal. There's plenty of precedent on that.
Just like if you manually copied a function out of some GPL code and pasted it into your own.
What will happen when company A implements algorithm X based on AI output, company B does the same and company A claims that it is proprietary code and takes company B to court?
It cannot do anything on its own, it's just a (very complex, probabilistic) mechanical transformation (including interpolation) of training data and a prompt.
Advertising autocomplete as AI was a genius move because people start humanizing it and look for human-centric patterns.
Thinking A"I" can do anything on its own is like seeing faces in rocks on Mars.
The idea that something that can't handle simple algorithms (e.g. counting the number of times a letter occurs in a word) could magically churn out far more advanced algorithms complete with tests is… well it's a bit of a stretch.
LLMs aren't good at rote memorization. They can't even get quotations of humans right.
It's easier for the LLM to rewrite an idiomatic computational geometry algorithm from scratch in a language it understands well like Python. Entire computational geometry textbooks and research papers are in its knowledge base. It doesn't have to copy some proprietary implementation.
That seems a real stretch. GPT 5 just invented new math for reference. What you are saying would be equivalent to saying that this math was obviously in some paper that mathematician did not know about. Maybe true, but it's a far reach.
It invented "new math" as much as I invented "new food" when I was cooking yesterday. It did a series of quite complicated calculations that would take a well trained human several hours or even days to do - still impressive, but no it's not new maths.
Obviously not ChatGPT. But ChatGPT isn't the sharpest stick on the block by a significant margin. It is a mistake to judge what AIs can do based on what ChatGPT does.
This would be the first time ever that an LLM has discovered new knowledge, but the far reach is that the information does appear in the training data?
They've been doing it for a while. Gemini has also discovered new math and new algorithms.
There is an entire research field of scientific discovery using LLMs together with sub-disciplines for the various specialization. LLMs routinely discover new things.
I hadn't heard of that, so I did some searching and the single source for the claim I can find is a Google white paper. That doesn't automatically mean it's false, of course, but it is curious that the only people ostensibly showing LLMs discover new things are the companies offering the LLMs.
Citation needed, and I call bullshit. Unless you mean that they hallucinate useless algorithms that do not work, which they do.
LLMs do not have an internal model for manipulating mathematical objects. They cannot, by design, come up with new algorithms unless they are very nearly the same as some other algorithm. I'm a computer science researcher and have not heard of a single algorithm created by LLM.
This article is about the same thing I mentioned in a sibling comment. I personally don't find an unreplicated Google white paper to be compelling evidence.
The AI coming up with it? When Google claimed their Wizard of Oz show at the Las Vegas Sphere was AI-generated, a ton of VFX artists spoke up to say they'd spent months of human labor working on it. Forgive me for not giving the benefit of the doubt to a company that has a vested interest in making their AI seem more powerful, and a track record of lying to do so.
If you have code that happens to be identical to some else's code or implements someone's proprietary algorithm, you're going to lose in court even if you claim an "AI" gave it to you.
AI is training on private Github repos and coughing them up. I've had it regurgitate a very well written piece of code to do a particular computational geometry algorithm. It presented perfect, idiomatic Python with perfect tests that caught all the degenerate cases. That was obviously proprietary code--no amount of searching came up with anything even remotely close (it's why I asked the AI, after all).