this post was submitted on 04 Sep 2023
39 points (100.0% liked)
Technology
37720 readers
580 users here now
A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.
Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.
Subcommunities on Beehaw:
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
What we (people in general, that use the internet, regardless of government/country) need, in large part, is literacy. Not "gen AI literacy" or "media literacy", but simply "literacy".
I'm saying this because of a lot of the output of those text generators: says a lot without conveying much, it connects completely unrelated concepts because they happen to use similar words, it makes self-contradictory claims, things like this. And often its statements are completely unrelated to the context at hand. People with good literacy detect those things right off the bat, but people who struggle with basic reading comprehension don't.
The thing that strikes me about LLMs is that they have been created to chat. To converse. They’re partly influenced by Turing tests where the objective is to convince someone you’re human by keeping up a conversation. They weren’t designed to create meaningful content or factual content.
People still seem to want to use chat GPT to create something, and fix the accuracy as a second step. I say go back to the drawing board and create a tool that analyses statements and tries to create information based on trusted linked open data sources.
Discuss :)
I also think that they should go back to the drawing board, to add another abstraction layer: conceptualisation.
LLMs simply split words into tokens (similar-ish to morphemes) and, based on the tokens found in the input and preceding answer tokens, they throw a die to pick the next token.
This sort of "automatic morpheme chaining" does happen in human Language¹, but it's fairly minor. More than that: we associate individual and sets of morphemes with abstract concepts². Then we handle those concepts in contrast with our world knowledge³, give them some truth value, moral assessment etc., and then we recode them back into words. LLMs do not do anything remotely similar.
Let me give you an example. Consider the following sentence:
A human being can easily see a thousand issues with this sentence. But more importantly, we do it based on the following:
In all those cases we need to refer to the concepts behind the words, not just the words.
I do believe that a good text generator could model some conceptualisation. And even world knowledge. If such a generator was created, it would easily surpass LLMs even with considerably lower linguistic input.
Notes:
Thank you for replying. This is the level of info I used to love on Reddit and now love on Lemmy.
You're welcome!
I've been mildly excited about machine text generators, mostly due to my interest in Linguistics. But I can't help but point out the flaws on LLMs, specially when people get overexcited for what I see as a rather primitive approach.