> A reasonable interpretation, in my eyes, is that the training process is a bla...

> A reasonable interpretation, in my eyes, is that the training process is a black box which takes in copyrighted works and produces a training model. The training model is a derivative work of the inputs. It therefore violates the copyrights of a large number of rights holders. The outputs of the model are derivative works which also violate copyright.

What is the blackbox “limit” here? Is the mean value of all images in imagenet (which contains many copyrighted images) violating copyright? Is the character count of sarah silverman’s books? What about a prime number representing them - https://en.wikipedia.org/wiki/Illegal_number?wprov=sfti1

Training is much more similar to a character count than an illegal prime in my view, and thus, is almost certainly going to be okay/found to be okay. If not, something like, 90% of all models used today had some component trained on copyrighted data of some form.