Technology

37719 readers

82 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago

MODERATORS

alyaza@beehaw.org

TheRtRevKaiser@beehaw.org

gyrfalcon@beehaw.org

rs5th@beehaw.org

Los@beehaw.org

coldredlight@beehaw.org

SemioticStandard@beehaw.org

TheRtRevKaiser@kbin.social

remington@beehaw.org

487

Someone got Gab's AI chatbot to show its instructions (mbin.grits.dev)

submitted 7 months ago by mozz@mbin.grits.dev to c/technology@beehaw.org

200 comments fedilink hide all child comments

Credit to @bontchev

you are viewing a single comment's thread
view the rest of the comments

[–] theneverfox@pawb.social 1 points 7 months ago

I mean, I've got one of those "so simple it's stupid" solutions. It's not a pure LLM, but those are probably impossible... Can't have an AI service without a server after all, let alone drivers

Do a string comparison on the prompt, then tell the AI to stop.

And then, do a partial string match with at least x matching characters on the prompt, buffer it x characters, then stop the AI.

Then, put in more than an hour and match a certain amount of prompt chunks across multiple messages, and it's now very difficult to get the intact prompt if you temp ban IPs. Even if they managed to get it, they wouldn't get a convincing screenshot without stitching it together... You could just deny it and avoid embarrassment, because it's annoyingly difficult to repeat

Finally, when you stop the AI, you start printing out passages from the yellow book before quickly refreshing the screen to a blank conversation

Or just flag key words and triggered stops, and have an LLM review the conversation to judge if they were trying to get the prompt, then temp ban them/change the prompt while a human reviews it