Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have a question about this—-isn’t it against the OpenAI Terms of Service to do this?


Yes but I doubt anyone is going to get the Aaron Swartz treatment over it, especially when OpenAI's own models are no doubt generated by playing fast and lose with ToS. E.g. at least as early as 2018, StackOverflow's ToS said:

"Any other downloading, copying, or storing of any public Network Content (other than Subscriber Content or content made available via the Stack Overflow API) for other than personal, noncommercial use is expressly prohibited without prior written permission from Stack Overflow or from the copyright holder identified in the copyright notice per the Creative Commons License"


Ahhhh, yes. OpenAI's good old ToS... Where it's OK to break a ToS / copyright if you're OpenAI for the input to generate the output that, you (customer) don't own and can't cache. Because that would impact their revenue model, be more efficient (power and cost) and still leave them holding the bag after ingesting loads of content they never had a right to in the first place but have staked their claim that it's OK because there's a lot riding on their success.

And, oh by the way, they'll just change their ToS as it suits them for more revenue opportunities even when they stated they wouldn't do business with - oh you know nation state militaries. But - JK! Now we will because <enter some 1%er excuse here>.


It took me 30 seconds to read their TOS and confirm you're just making most of that up.

> As between you and OpenAI, and to the extent permitted by applicable law, you (a) retain your ownership rights in Input and (b) own the Output. We hereby assign to you all our right, title, and interest, if any, in and to Output.

It follows that your claim about caching violating OAI's terms is nonsense.


I think you missed my point. "Caching" output by training a more efficient / cheaper model with that output is in fact against their ToS. In my simple brain that is a form of caching, and I stand by my original post.

I've not made anything up. Your claim that I have is nonsense.

OpenAI changing their ToS for the military on a whim: https://archive.is/GILKl - for your enjoyment.

OpenAI ToS: "What You Cannot Do. You may not use our Services for any illegal, harmful, or abusive activity. For example, you may not:

* Use Output to develop models that compete with OpenAI."


> I think you missed my point. "Caching" output by training a more efficient / cheaper model with that output is in fact against their ToS. In my simple brain that is a form of caching.

If that was your point, I'm pretty sure everyone missed it. No one is training models as a form of caching their previous responses. They want to improve the quality of responses they haven't generated yet. That's not caching.

> I've not made anything up.

You said customers don't own the output; they do. I said you made most of it up, and you did. Including your apparent retconning of your original point.


> If that was your point, I'm pretty sure everyone missed it. No one is training models as a form of caching their previous responses. They want to improve the quality of responses they haven't generated yet. That's not caching.

So... You didn't read the article of which you're commenting in?

> You said customers don't own the output; they do. I said you made most of it up, and you did.

You don't own it. If I own something, I can do whatever I want with it. This is just like your iPhone. You don't actually own it, because you can only do with it what Apple allows you to do.

> Including your apparent retconning of your original point.

Wow, enjoy your day. Your misunderstanding is, apparently, my "retconning". Maybe read the original piece you're responding to within the thread.


Did I read the article? You mean the tweet? If you're saying it supports your claim that fine-tuning a model is equivalent to caching, you are mistaken.

> If I own something, I can do whatever I want with it

BRB digitizing my entire media collection and uploading it to the public internet.


Yes, it's explicitly against their TOS.

> What You Cannot Do. [...]

> Use Output to develop models that compete with OpenAI.


Which is ironic given the fact that scraping was likely against the ToS for many of the sites which ended up in OpenAI's training corpus.


It will be interesting if the same court cases proving their use of everyone else's data make it fair use to use their machine output as training data. They're definitely in their rights to ban whomever but who knows if they have recourse beyond that?


But didn’t X do that with their ML model Grok?


Burning bridges and getting sued isn't uncharted territory for Elon Musk.


What's good for the goose...


If you're not selling/ putting your model out there as a generic competitor to OpenAi then you're not competing with them


That's a moving target :)


Have the same question - I mean, for training an open source model with no monetization attached, not much Open AI can do besides ban the user, but they can make another account. For a company doing this with the intent to sell it as a capability... seems risky.


Did you know that removing the tag from a mattress is illegal too? according to the tag.


If you read them they say it's illegal only if you don't own the mattress.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: