this post was submitted on 13 Jun 2024
8 points (100.0% liked)

The Linux Lugcast Podcast

171 readers
1 users here now

website: https://www.linuxlugcast.com/

mumble chat: lugcast.minnix.dev in the lugcast room

email: feedback@linuxlugcast.com

matrix room: https://matrix.to/#/#lugcast:minnix.dev

youtube: https://www.youtube.com/@thelinuxlugcast/videos

peertube: https://nightshift.minnix.dev/c/linux_lugcast/videos

founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] CodexArcanum@lemmy.world 5 points 4 months ago

Wow, this is wild stuff. I mean, i haven't kept up very well with the deep technicals of LLMs, but I hadn't realized where the current state of dev was for this stuff. People can download pretrained models, but that's just a pile of meaningless (to a human) statistical weights.

What this person is doing is akin to memory hacks for cheating at video games, but on a different scale. He runs several sets of valid and censored instructions through the model, watches which routes represent the censorship check, and then basically slices it out like brain surgery.

Holy shit, he's literally hacking the AI to remove it's ethical constraints!

Damn though, I feel like this same ablation technique could be used to do actual censorship. Like couldn't you run a pair of datasets through it to find the parts that target "gaza" or "labor unions" or other concepts you'd rather it not discuss and then slice them out? Is this a digital lobotomy?