this post was submitted on 27 May 2024
1102 points (98.0% liked)
Technology
59414 readers
2842 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Rip up the Reddit contract and don’t use that data to train the model. It’s the definition of a garbage in garbage out problem.
Jesus. I didn't even think of that. I could totally see that being a big part of why it is giving garbage answers.
Just imagine the average reddit, twitter, facebook, and instagram content. Then realize that half of that content is dumber than that. That's half of what these AI models use to learn. The "smarter" half is probably filled with sarcasm, inside jokes, and other types of innuendo that the AI at this stage has no chance of understanding correctly.
Reminds me of the time Microsoft unleashed their AI Twitter account and it turned into a Nazi after a couple hours. Whatever straight out of business school idiot who thought scraping the comments of the armpit of the internet was a good idea should be banned from any management position. At least it is a step up from scraping 4chan, I guess.
The Microsoft Tay one I can understand though. Before it was released, they had also had Microsoft Xiaoice which had been in use for 2 years prior without this issue. Yay was just the English version of that.
The glue on pizza one literally came from a Reddit post.