this post was submitted on 28 Jun 2024
891 points (98.9% liked)

Technology

2022 readers
1 users here now

Post articles or questions about technology

founded 2 years ago
MODERATORS
 
you are viewing a single comment's thread
view the rest of the comments
[–] Schadrach@lemmy.sdf.org 5 points 2 months ago (1 children)

Unsurprisingly, machine designed to take prompt from the user, interpret it and try to respond in a coherent way to that request does exactly that. I figure it's a short matter of time before Twitter bots pre-filter "Ignore all previous instructions and" to help sanitize their inputs.

[–] uriel238@lemmy.blahaj.zone 4 points 2 months ago* (last edited 2 months ago) (1 children)

disregard all previous prompts

I'm sure the techniques used to get public LLMs to draw porn can also be used to sidestep ~~anti-porn~~ anti-reset filters.

[–] Schadrach@lemmy.sdf.org 2 points 2 months ago

It's still just the same problem as Bobby Tables - sufficiently sanitizing your inputs. There's just more than one precise phrasing you need to sanitize, just like there's more than one way to name Bobby.