this post was submitted on 20 Dec 2023
136 points (95.3% liked)

Technology

59381 readers
2931 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] Zarxrax@lemmy.world 17 points 11 months ago

While I get what you are saying, it's pretty clear that what he was saying was that if you actually populate the dataset by downloading the images contained in the links (which anyone who is actually using the dataset to train a model would need to do), then you have inadvertantly downloaded illegal images.

It is mentioned repeatedly in the article that the dataset itself is simply a list of urls to the images.