Futurology

1760 readers

146 users here now

founded 1 year ago

MODERATORS

voidx@futurology.today

Lugh@futurology.today

Espiritdescali@futurology.today

AwesomeLowlander@futurology.today

189

AI Companies Running Out of Training Data After Burning Through Entire Internet (futurism.com)

submitted 7 months ago by voidx@futurology.today to c/futurology@futurology.today

52 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] pennomi@lemmy.world 55 points 7 months ago (4 children)

There’s already more than enough training data out there. The important thing that remains is to filter it so it doesn’t also include humanity’s stupidest data.

That and make the algorithms smarter so they are resistant to hallucination and misinformation - that’s not a data problem, it’s an architecture problem.

[–] FaceDeer@fedia.io 19 points 7 months ago

Stupid data can be useful for training as a negative example. Image generators use negative prompts to good effect.

[–] MotoAsh@lemmy.world 9 points 7 months ago (1 children)

Butbutbut my ignorant racism is the truth!! That's why I hear it from everyone, including [insert near by relatives here]!!

[–] Takumidesh@lemmy.world 3 points 7 months ago (1 children)

Well is the goal truth? Or a simulacrum of a human?

[–] MotoAsh@lemmy.world 2 points 7 months ago* (last edited 7 months ago)

Considering not even all humans are hireable, I'd say only a fool aims for a simulacrum.

[–] Ultraviolet@lemmy.world 4 points 7 months ago

You also have to filter out the AI generated garbage that is rapidly becoming a majority of content on the internet.

[–] CanadaPlus@lemmy.sdf.org 4 points 7 months ago* (last edited 7 months ago)

Well, it's established wisdom that the dataset size needs to scale with the number of model parameters. Quadratically, IIRC. If you don't have that much data the training basically won't work; it will overfit or just not progress.