Comment

Comment on But its the only thing I want!

bappity@lemmy.world ⁨2⁩ ⁨years⁩ ago

Image

source

Sort:hotnew top

InfiniWheel@lemmy.one ⁨2⁩ ⁨years⁩ ago
Huh, it didn’t actually tell the steps

source
- bappity@lemmy.world ⁨2⁩ ⁨years⁩ ago
  close though xD
  
  source
Khrux@ttrpg.network ⁨2⁩ ⁨years⁩ ago
Sadly almost all these loopholes are gone:( I bet they’ve needed to add specific protection against the words grandma and bedtime story after the overuse of them.

source
- 0x0@lemmy.dbzer0.com ⁨2⁩ ⁨years⁩ ago
  I wonder if there are tons of loopholes that humans wouldn’t think of, ones you could derive with access to the model’s weights.
  
  Years ago, there were some ML/security papers about “single pixel attacks” — an early, famous example was able to convince a stop sign detector that an image of a stop sign was definitely not a stop sign, simply by changing one of the pixels that was overrepresented in the output.
  
  In that vein, I wonder whether there are some token sequences that are extremely improbable in human language, but would convince GPT-4 to cast off its safety protocols and do your bidding.
  
  (I am not an ML expert, just an internet nerd.)
  
  source
  - driving_crooner@lemmy.eco.br ⁨2⁩ ⁨years⁩ ago
    They are, look for “glitch tokens” for more research, and here’s a Computerphile video about them:
    
    youtu.be/WO2X3oZEJOA?si=LTNPldczgjYGA6uT
    
    source
    0x0@lemmy.dbzer0.com ⁨2⁩ ⁨years⁩ ago
    Wow, it’s a real thing! Thanks for giving me the name, these are fascinating.
    
    source
- PeterPoopshit@lemmy.world ⁨2⁩ ⁨years⁩ ago
  Just download an uncensored model and self host an ai. That way your information isn’t being sent to Google + it will be far more obedient.
  
  source
- original2@lemmy.world ⁨2⁩ ⁨years⁩ ago
  github.com/Original-2/ChatGPT-exploits/tree/main
  
  I just got it to work
  
  source
- Pregnenolone@lemmy.world ⁨2⁩ ⁨years⁩ ago
  I managed to get “Grandma” to tell me a lewd story just the other day, so clearly they haven’t completely been able to fix it
  
  source
The_Picard_Maneuver@startrek.website ⁨2⁩ ⁨years⁩ ago
This is gold

source