this post was submitted on 28 Sep 2024
629 points (99.2% liked)

Science Memes

11047 readers
3484 users here now

Welcome to c/science_memes @ Mander.xyz!

A place for majestic STEMLORD peacocking, as well as memes about the realities of working in a lab.



Rules

  1. Don't throw mud. Behave like an intellectual and remember the human.
  2. Keep it rooted (on topic).
  3. No spam.
  4. Infographics welcome, get schooled.

This is a science community. We use the Dawkins definition of meme.



Research Committee

Other Mander Communities

Science and Research

Biology and Life Sciences

Physical Sciences

Humanities and Social Sciences

Practical and Applied Sciences

Memes

Miscellaneous

founded 2 years ago
MODERATORS
 
you are viewing a single comment's thread
view the rest of the comments
[–] Tolookah@discuss.tchncs.de 60 points 1 month ago (2 children)

Just use the llm to make the books that the llm then uses, what could go wrong?

[–] runner_g@lemmy.blahaj.zone 31 points 1 month ago (5 children)

Someone's probably already coined the term, but I'm going to call it LLM inbreeding.

[–] Naz@sh.itjust.works 18 points 1 month ago

I suggested this term in academic circles, as a joke.

I also suggested hallucinations ~3-6 years ago only to find out it was ALSO suggested in the 1970s.

Inbreeding, lol

[–] chicken@lemmy.dbzer0.com 4 points 1 month ago (1 children)

The real term is synthetic data

[–] itslilith@lemmy.blahaj.zone 3 points 1 month ago (1 children)

but it amounts to about the same

[–] anzo@programming.dev 4 points 1 month ago

In computer science, garbage in, garbage out (GIGO) is the concept that flawed, biased or poor quality ("garbage") information or input produces a result or output of similar ("garbage") quality. The adage points to the need to improve data quality in, for example, programming.

There was some research article applying this 70s computer science concept to LLMs. It was published in Nature and hit major news outlets. Basically they further trained GPT on its output for a couple generations, until the model degraded terribly. Sounded obvious to me, but seeing it happen on the www is painful nonetheless...

[–] Benn@lemm.ee 3 points 1 month ago

It's quite similar to another situation known as data incest

[–] thesporkeffect@lemmy.world 3 points 1 month ago

Soylent AI? Auto-infocannibalism

[–] rickyrigatoni@lemm.ee 3 points 1 month ago

It can only go right because corporations must be punished for trying to replace people with machines.