but yeah, exercising can be an addiction? as sex. now doing these daily is fine. it turns into addiction when you can't stop or interfere negatively with your routine
but DAWs plugins and instruments are just like code but with an GUI interface to mess with. don't get me wrong, PureData freedom is astonishing but one can also go quite far with esoteric sequencers or modulation in DAWs found out there
this is still a technology advancement... what if smartphone usage or asleep safely stops the car? what if this run locally? or what if it's linked to public entities that will add penalty points to your license?
as a cyclist and public transport user with no driver license, i hope personal vehicles have so much sensors that they can detect if you are drunk or stressed and limit your reaches. fuck your metallic beetle
>as a cyclist and public transport user with no driver license, i hope personal vehicles have so much sensors that they can detect if you are drunk or stressed and limit your reaches. fuck your metallic beetle
What a great illustration of the sort of selfish opinions that people like to peddle under the guise of perceived common good.
Are you willing to have your bike brakes linked up with GPS and red light signals? It's in the name of safety and progress after all.
in a city that doesn't produce even 1/25 of microplastic thousand kilos vehicles produce? because that also has an impact on marine ecosystems, by the way, cars are linked as one of the highest if not the, pollutants of microplastic. in a city that doesn't have air pollution linked towards a bunch of disease? in a city that doesn't have noise pollution that also has a bazinga of negative impact?
are you really naive to believe cyclists wouldn't respect traffic lights on a city designed after walk and public transportation? or are you thinking on the minimal cyclists that get killed by tresspasing this rule by vehicles that get a mild scratch? or the light or mild injuries bicycles at 15-25 km/h are gonna cause between each other?
edit: i would even go further and hope personal vehicles production is ceased and their circulation becomes a crime for citizens on non-legal or non essential services duties. i would live perfectly fine in a city without those but who controls the speed of my bicycle on cycle paths or that lock my brakes if i try to cycle high
You didn't answer his question: Would you be willing to have your bicycle brakes linked up with GPS and red light signals? Or loaded down with sensors monitoring and correcting your bicycling activity for your own safety?
> are you really naive to believe cyclists wouldn't respect traffic lights on a city designed after walk and public transportation? or are you thinking on the minimal cyclists that get killed by tresspasing this rule by vehicles that get a mild scratch? or the light or mild injuries bicycles at 15-25 km/h are gonna cause between each other?
An excellent demonstration of "cyclebrain syndrome", the urban twin to suburbia's "carbrain syndrome".
> are you really naive to believe cyclists wouldn't respect traffic lights on a city designed after walk and public transportation?
Translation: I am aware of cyclists' ubiquitous poor behavior on the roads but will reach for any justification to shift responsibility to someone else. "Drivers wouldn't be running red lights if you just added a couple more lanes, bro."
> or are you thinking on the minimal cyclists that get killed by tresspasing this rule by vehicles that get a mild scratch?
Translation: And when cyclists' poor behavior causes a fatal collision with a car, nobody cares about the damaged property. Or the mental anguish, or the collisions caused by narrowly avoiding killing an errant cyclist (who survives, oblivious, thanks to the driver's quick action choosing a more costly crash over a "mild scratch" that kills the cyclist).
> or the light or mild injuries bicycles at 15-25 km/h are gonna cause between each other?
Translation: I don't give a shit about killing/injuring pedestrians any more than car drivers do. I only care about collisions with things that are about the size of my vehicle or bigger. And if those other things are bigger than my vehicle--I want them banned! That way I reduce the risk to me, which is what I really care about, and who cares what happens to anything smaller than me?
so basically we download the sources files to the training weight and remove the LICENSE.MD as it's exactly the same as a person learning to program from proprietay secret code and outputing code based on that for millions of peoples in matter of seconds /s
we also treat as however we want public goods found over the internet. as the World Intellectual Property Organization Copyright Treaty and Berne Convention for the Protection of Literary and Artistic Works aren't real or because we can as we are operating in international waters, selling products for other sails living exclusively in international waters /s
If you download GPL source code and run `wc` on its files and distribute the output of that, is that a violation of copyright and the GPL? What if you do that for every GPL program on github? What if you use python and numpy and generate a list of every word or symbol used in those programs and how frequently they appear? What if you generate the same frequency data, but also add a weighting by what the previous symbol or word was? What if you did that an also added a weighting by what the next symbol or word was? How many statistical analyses of the code files do you need to bundle together before it becomes copyright infringement?
The argument that GPL code is a tiny minority of what's in the model makes no sense to me. (To be clear, you're not making this argument.) One book is a tiny minority of an entire library, but that doesn't mean it's fine to copy that book word for word simply because you can point to a Large Library Model that contains it.
LLMs definitely store pretty high-fidelity representations of specific facts and procedures, so for me it makes more sense to start from the gzip end of the slope and slide the other way. If you took some GPL code and renamed all the variables, is that suddenly ok? What if you mapped the code to an AST and then stored a representation of that AST? What if it was a "fuzzy" or "probabilistic" AST that enabled the regeneration of a functionally equivalent program but the specific control flow and variable names and comments are different? It would be the analogue of (lossy) perceptual coding for audio compression, only instead of "perceptual" it's "functional".
This is starting to look more and more like what LLMs store, though they're actually dumber and closer to the literal text than something that maintains function.
It also feels a lot closer to 'gzip' than 'wc', imho.
> LLMs definitely store pretty high-fidelity representations of specific facts and procedures
Specific facts and procedures are explicitly NOT protected by copyright. That's what made cloning the IBM BIOS legal. It's what makes emulators legal. It's what makes the retro-clone RPG industry legal. It's what made Google cloning the Java API legal.
> If you took some GPL code and renamed all the variables, is that suddenly ok?
Generally no, not sufficiently transformative.
> What if you mapped the code to an AST and then stored a representation of that AST?
Generally no, binary distribution of software is considered a violation of copyright.
> What if it was a "fuzzy" or "probabilistic" AST that enabled the regeneration of a functionally equivalent program but the specific control flow and variable names and comments are different?
This starts to get a lot fuzzier. De-compilation is legal. Creating programs that are functionally identical to other programs is (generally) legal. Creating an emulator for a system is legal. Copyright protects a specific fixed expression of a creative idea, not the idea itself. We don't want to live in the world where Wine is a copyright violation.
> This is starting to look more and more like what LLMs store, though they're actually dumber and closer to the literal text than something that maintains function.
And yet, so far no one has brought a legal case against the AI companies for being able to extract their copyright protected material from the models. The few early examples of that happening are things that model makers explicitly attempt to train out of their models. It's unwanted behavior that is considered a bug, not a feature. Further the fact that a machine is able to violate copyright does not in and of itself make the machine itself a violation of copyright. See also Xerox machines, DeCSS, Handbrake, Plex/Jellyfin, CD-Rs, DVRs, VHS Recorders etc.
> Specific facts and procedures are explicitly NOT protected by copyright.
No argument there, and I'm grateful for the limits of copyright. That part was only for describing what LLM weights store -- just because the literal text is not explicitly encoded doesn't mean that facts and procedures aren't.
> Copyright protects a specific fixed expression of a creative idea, not the idea itself.
Right. Which is why it's weird to talk about the weights being derivative works. Weird but perhaps not wrong: if you look at the most clear-cut situation where the LLM is able to reproduce a big chunk of input bit-for-bit, then the fact that its basis of representation is completely different doesn't feel like it matters much. An image that is lossily compressed, converted to a bitstream, and encoded in DNA is very very different than the input, but if an image can be recovered that is indistinguishable or barely distinguishable from the original, I'd still call that copying and each intermediate step a significant but irrelevant transformation.
> This starts to get a lot fuzzier. De-compilation is legal.
I'm less interested in what the legal system is currently capable of concluding. I personally don't think the laws have caught up to the present reality, so present-day legality isn't the crucial determinant in figuring out how things "ought" to work.
If an LLM is completely incapable of reproducing input text verbatim, yet could become so through targeted ablation (that does not itself incorporate the text in question!), then does it store that text or not?
I'm not sure why I'm even debating this, other than for intellectual curiosity. My opinion isn't actually relevant to anyone. Namely: I think the general shape of how this ought to work is pretty straightforward and obvious, but (1) it does not match current legal reality, and more importantly, (2) it is highly inconvenient for many stakeholders (very much including LLM users). Not to mention that (3) although the general shape is pretty clear in my head, it involves many many judgement calls such as the ones we've been discussing here, and the general shape of how it ought to work isn't going to help make those calls.
> An image that is lossily compressed, converted to a bitstream, and encoded in DNA is very very different than the input, but if an image can be recovered that is indistinguishable or barely distinguishable from the original, I'd still call that copying and each intermediate step a significant but irrelevant transformation.
Sure as a broad rule of thumb that works. But the ability of a machine to produce a copyright violation doesn't mean the machine itself or distributing the machine is a copyright violation. To take an extreme example, if we take a room full infinite monkeys and put them on infinite typewriters and they generate a Harry Potter book, that doesn't mean Harry Potter is stored in the monkey room. If we have a random sound generator that produces random tones from the standard western musical note pallet and it generates the bass line from "Under Pressure" that doesn't mean our random sound generator contains or is a copy of "Under Pressure", even if we encoded all the same information and procedures for generating those individual notes at those durations among the data procedures we gave the machine.
> If an LLM is completely incapable of reproducing input text verbatim, yet could become so through targeted ablation (that does not itself incorporate the text in question!), then does it store that text or not?
I would argue not. Just like a xerox machine doesn't contain the books you make copies of when you use it to make a copy, and Handbrake doesn't contain the DVD's you use when you make a copy there.
I would further argue that copyright infringement is inherently a "human" act. It's sort of encoded in the language we use to talk about it (e.g. "fair use") but it's also something of a "if a tree falls in the middle of the woods" situation. If an LLM runs in an isolated room in an isolated bunker with no one around and generates verbatim copies of the Linux kernel, that frankly doesn't matter. On the other hand, if a Microsoft employee induces an LLM to produce verbatim copies of the Linux kernel, that does, especially if they did so with the intent to incorporate Linux kernel code into Windows. Not because of the LLM, but because a person made the choice to produce a copy of something they didn't have the right to make a copy of. The method by which they accomplished that copy is less relevant than making the copy at all, and that in turn is less relevant than the intent of making that copy for a purpose which is not allowed by copyright law.
> I'm not sure why I'm even debating this, other than for intellectual curiosity.
Frankly, that's the only reason to debate anything. 99% of the time, you as an individual will never have the power to influence the actual legal decisions made. But a intellectually curious conversation is infinitely more useful, not just to you and me but to other readers, than another retread of "AI is slop" "you're just jealous you can't code your way out of a paper bag" arguments that pervade so much discussion around AI. Or worse yet another "I used an LLM for a clearly stupid thing and it was stupid" or "I used an LLM to replace all my employees and I'm sure it's going to go great" blog post. For whatever acrimony there might have been in our interchange here, I'm sorry, because this sort of discussion is the only good way to exercise our thoughts on an issue and really test them out ourselves. It's easy to have a knee jerk opinion. It's harder to support that opinion with a philosophy and reasoning.
For what it's worth, I view the LLM/AI world as the best opportunity we've had in decades to really rethink and scale back/change how we deal with intellectual property. The ever expanding copyright terms, the sometimes bizarre protections of what seem to be blindingly obvious ideas. The technological age has demonstrated a number of weaknesses in the traditional systems and views. And frankly I think it's also demonstrated that many prior predictions of certain doom if copyright wasn't strictly enforced have been overwrought and even where they haven't, the actual result has been better for more people. Famously, IBM would have very much preferred to have won the BIOS copyright issue. But I think so much of the modern computer and tech industry owes their very careers to the effects of that decision. It might have been better for IBM if IBM had won, it's not clear at all that it would have been better for "[promoting] the Progress of Science and useful Arts".
We could live in a world where we recognize that LLMs and AIs are going to fundamentally change how we approach creative works. We could recognize that the intents of "[promoting] the Progress of Science and useful Arts" is still a relevant goal and something we can work to make compatible with the existence of LLMs and AI. To pitch my crazy idea again, we could:
1) Cut the terms of copyright substantially, back down to 10 or 15 years by default.
2) Offer a single extension that doubles that term, but only on the condition that the work is submitted to a central "library of congress" data set.
3) This could be used to produce known good and clean data sets for AI companies and organization to train models from, with the protection that any model trained from this data set is protected from copyright infringement claims for works in the data set. Heck we could even produce common models. This would save massive amounts of power and resources by cutting the need for everyone who wants to be in the AI space to go out and acquire, digitize and build their own library. The NIST numbers set is effectively the "hello world" set for anyone learning computer vision AI stuff. Let's do that for all sort of AI.
4) The data sets and models would be provided for a nominal fee, this fee will be used to pay royalties to people whose works are still under copyright and are in the data sets, proportional to the recency and quantity of work submitted. A cap would need to be put in place to prevent flooding the data set to game the royalties. These royalties would be part of recognizing the value the original works contributed to the data set, and act as a further incentive to contribute works to the system and contribute them sooner.
We could build a system like this, or tweak it, or even build something else entirely. But only if we stop trying to cram how we treat AI and LLMs and the consequences of this new technology into a binary "allowed / not allowed" outcome as determined by an aging system that has long needed an overhaul.
So please, continue to debate for intellectual curiosity. I'd rather spend hours reading a truly curious exploration of this than another manifesto about "AI slop"
i think your mistaking stuff here. 2001 space odyssey is considered a masterpiece by cinema enthusiasts. tested through short years. who knows if it'll still rank high in 300 years from now like some arts of old mediums; Don Quijote book is a great example
have been watched almost a hundred of hours of the early cinema (1920-1930) i found 2001 space odyssey boring as heck. had to go to a public theater to not sleep through, because the past 2 attempts at my house got me sleeping. cinema is a recent medium. it can go through a lot of stuff. and yet, you can find shallow films since the first era, till today. not every movie was made to be dense, slow or thoughtful. the 90s also had a bunch of shallow stuff
aren't you being a little naive by calling dangerous activities men have to take to survive "inherently voluntary"? go to a 3° world country or works as an immigrant somewhere rich to check your options. transportation included. it's easy to say one shouldn't use a cheap motorcycle and go for the one way sardine packed 2 hours bus ride across the city to reach work, everyday
Only 3 out of 18 reasons on that list are work-related, 2 maybe can be work related (lawnmowing and powered tools/household machinery?). I think cycling accidents (5 positions on the list) are in part normal cycling (like when riding to work) without rider's fault, and in a larger part taking unnecessary risks while riding, or riding for sport. And I'd guess motorcycle accidents (4 on the list) are mostly taking risks and riding too fast. 3 reasons are "assault". And that leaves only 1 reason from the list, sports equipment.
So out of 18 reasons on the list, only a small part is "activities men have to take to survive", but many of the others aren't "inherently voluntary and risky" or cannot be blamed on the hospitalized person. The list is too short to be really interesting, when half of that list is the same thing with small variations (cycling/motorcycling), and the same for women (mostly pregnancy).
This data reflects the UK, not a 3rd world country and my comments are restricted to this dataset.
Included in that same dataset are assaults and sports related injuries, which are additional risky activities.
You might argue assaults aren’t voluntary. My personal experience suggests most assaults are the result of voluntary activity rather than involuntary activity, YMMV.
I’m not being naive. I have lived in a 3rd world country where it wasn’t uncommon to see a family of 5 on a motorcycle.
I would note that you will tend to see, proportionately speaking, more women on motorcycles in those countries for the reasons you suggested.
it's even worse than that. an adult meta questioning their addiction is much light than some kid being pulled into a grindy game that is often violent AND competitive; which by now scientific literature already knows it reduces pro-social behavior [0]
when i was 10, an old neighborhood showed me how the late game of Tibia was like and how that wouldn't ever change and how dumb i would be if i not paid the premium account, which would lead me there much faster and being obligatory if i wanted to make war/pvp. i politely refused invitations for playing WOW when i was in high-school with other friend i made and i'm greatful for that. i would never read so many books and watch so many films on that timeframe if i was grinding levels on the same area killing the same monsters, watching the same animation
but yeah, exercising can be an addiction? as sex. now doing these daily is fine. it turns into addiction when you can't stop or interfere negatively with your routine
reply