this post was submitted on 08 Feb 2024
66 points (93.4% liked)
13621 readers
2 users here now
founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Yeah, the window of opportunity for that has already started rapidly closing. In 2022 the strategy that worked for launching the AI craze was "throw as much data as you possibly can into the training phase and somehow a functioning LLM comes out." But over 2023 the state of the art advanced a lot and it became apparent that you don't need vast reams of raw data, what's really ideal for producing a good LLM is a smaller amount of high-quality data.
You can still use Reddit data as a source for that, but it needs extensive culling and massaging to make it really good. I can easily see that making Reddit less unique and so less competitive.
The AIs already pulled loads of data from Reddit and can re-use what they have. They don't necessarily need to go back ever again, and they'd only pay for access to newly created data if they care at all.
Right. If it sucks for users then where do you expect to get the data to sell to AI bots?