Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
visarga
on Jan 21, 2025
|
parent
|
context
|
favorite
| on:
Authors seek Meta's torrent client logs and seedin...
More recently they train on a mix of synthetic and organic text, like the Phi-4 and o1 / o3 models. Original copyrighted text can be safely replaced with synthetic standins.
BonoboIO
on Jan 21, 2025
[–]
I think this works only to a certain degree, they will still use as much data as they can use to train the models.
Synthetic data will not replace original data like books. Synthetic data works very good for math.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: