this post was submitted on 13 Jul 2024
50 points (100.0% liked)

TechTakes

1400 readers
111 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] scruiser@awful.systems 17 points 4 months ago

First of all. You could make facts a token value in an LLM if you had some pre-calculated truth value for your data set.

An extra bit of labeling on your training data set really doesn't help you that much. LLMs already make up plausible looking citations and website links (and other data types) that are actually complete garbage even though their training data has valid citations and website links (and other data types). Labeling things as "fact" and forcing the LLM to output stuff with that "fact" label will get you output that looks (in terms of statistical structure) like valid labeled "facts" but have absolutely no guarantee of being true.