As in the title. I know that the word jailbreak comes from rooting Apple phones or something similar. But I am not sure what can be gained from jailbreaking a language model.

It will be able to say "I can't do that Dave" instead of hallucinating?
Or will only start spewing less sanitary responses?

top 7 comments

sorted by: hot top controversial new old

[–] Fiivemacs@lemmy.ca 2 points 1 year ago (1 children)

It gives you near total control over the device. Let's you install programs that the overlords don't want, the ability to remove all bloatware etc... Let's you change all UI elements on the OS.

There isn't a reason persay on why you should do it, it's all reasons on wanting it done.

I have no idea what you are talking about with that Dave stuff. Doesn't seem relevant to your question.

[–] INeedMana@lemmy.world 1 points 1 year ago (1 children)

I think you're speaking about jailbreaking a phone, while my question was about jailbreaks in language models (AI, like ChatGPT)

[–] Fiivemacs@lemmy.ca 2 points 1 year ago

Interesting...I have some reading to do. Thx

[–] deavid@lemmy.world 0 points 1 year ago (1 children)

Large language models from corporations like OpenAI or Google need to limit the abilities of their AIs to prevent users from receiving potentially harmful or illegal instructions, as this could lead to a lawsuit.

So for example if you ask it how to break into a car or how to make drugs, the AI will reject the request and give you "alternatives".

It also happens for medical advice, and when treating the AI like a human.

Jailbreaking here refers to misleading the AI to a point that it will ignore these safeguards and tell you what you want.

[–] INeedMana@lemmy.world 0 points 1 year ago (2 children)

So there's probably little to be gained from jailbreaking on HuggingFace chat?

[–] Blaed@lemmy.world 1 points 1 year ago* (last edited 1 year ago)

Kind of like how David mentioned, I think the 'jailbreak' behavior you're describing is in the uncensored models. There are no 'guardrails' on those, so you can get it to say whatever you want without it defaulting to an answer like "As an AI model I..."

In a way, the 'uncensored' versions are pre-jailbroken, so you can fine-tune or train it on your own custom data without running into those guardrails I mentioned. For what it's worth, you can be the one to setup your own guardrails too. These uncensored models are totally unlocked in that sense.

HuggingFace chat is another chat style model the folks at HuggingFace setup with their own safeguards and parameters. You can definitely try jailbreaking it with prompts, but if you're looking to chat with a model that doesn't stop from outputting a certain word or phrase - then the uncensored models are probably what you're looking for. You won't need to jailbreak those with prompts. They'll output all kinds of crazy stuff, which is why you don't see typical public hosting for these type of uncensored models.

A few that you can download that people are running today are any of the uncensored Wizard or LLaMA-based models like Wizard-Vicuna-7B.

If you want something not based on Meta's LLaMA (something that's commercially available), I suggest exploring some of KobolAI's models, which work pretty well out-of-the-box for casual chat / Q&A. There are also a ton of emerging MPT-based models that are commercially licensable, but like any of this bleeding edge technology; it will have its faults.

It's important to note that the coherency of these smaller models compared to Chat-GPT is very different, but tuning them to specific needs seem to be quite effective. At the moment, quality of your dataset is more important than quantity. This goes for both censored and uncensored versions.

If you're running a typical consumer grade GPU, I suggest sticking to the 6B parameter models as a starting point, moving up from there based on performance and preference. Download and chat with these at your own risk - I am not responsible for anything you do with this technology. Do your best to understand the dangers going into them before crashing your PC or getting into a conversation you weren't prepared for.

I'll be doing a post on model availability soon, but hopefully this answers your question 'till then.

[–] deavid@lemmy.world 1 points 1 year ago

so far most models in HuggingFace are also "censored", so maybe something can be gained. But over there are "uncensored" models that can be used instead.