this post was submitted on 13 Jun 2024
8 points (100.0% liked)
The Linux Lugcast Podcast
171 readers
1 users here now
website: https://www.linuxlugcast.com/
mumble chat: lugcast.minnix.dev in the lugcast room
email: feedback@linuxlugcast.com
matrix room: https://matrix.to/#/#lugcast:minnix.dev
youtube: https://www.youtube.com/@thelinuxlugcast/videos
peertube: https://nightshift.minnix.dev/c/linux_lugcast/videos
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Wow, this is wild stuff. I mean, i haven't kept up very well with the deep technicals of LLMs, but I hadn't realized where the current state of dev was for this stuff. People can download pretrained models, but that's just a pile of meaningless (to a human) statistical weights.
What this person is doing is akin to memory hacks for cheating at video games, but on a different scale. He runs several sets of valid and censored instructions through the model, watches which routes represent the censorship check, and then basically slices it out like brain surgery.
Holy shit, he's literally hacking the AI to remove it's ethical constraints!
Damn though, I feel like this same ablation technique could be used to do actual censorship. Like couldn't you run a pair of datasets through it to find the parts that target "gaza" or "labor unions" or other concepts you'd rather it not discuss and then slice them out? Is this a digital lobotomy?