this post was submitted on 13 Mar 2024
14 points (100.0% liked)

Machine Learning

478 readers
2 users here now

A community for posting things related to machine learning

Icon base by Lorc under CC BY 3.0 with modifications to add a gradient

founded 1 year ago
MODERATORS
 

cross-posted from: https://lemmy.ml/post/13088944

you are viewing a single comment's thread
view the rest of the comments
[–] Akisamb@programming.dev 4 points 8 months ago

It does seem odd that scraping activity from just two accounts allegedly managed to cause such an extended server outage. The irony of this situation also hasn’t been lost on online creatives, who have extensively criticized both companies (and generative AI systems in general) for training their models on masses of online data scraped from their works without consent. Stable Diffusion and Midjourney have both been targeted with several copyright lawsuits, with the latter being accused of creating an artist database for training purposes in December.

As far as I know they do not have copyright over the output of their models. Apart from banning the users they pretty much have no solutions to stop this. Even if they had copyright, it's still legally unknown if training LLMs constitutes a copyright violation.

In a similar fashion a lot of the recent chat llm's have been trained on output from chatgpt. After all why pay humans to produce training data when your competitor has already done it for you.