Comment

Comment on Someone got Gab's AI chatbot to show its instructions

sweng@programming.dev ⁨1⁩ ⁨year⁩ ago

Only true if the second LLM follows instructions in the user’s input. There is no reason to train it to do so.

Sort:hotnew top

teawrecks@sopuli.xyz ⁨1⁩ ⁨year⁩ ago
Any input to the 2nd LLM is a prompt, so if it sees the user input, then it affects the probabilities of the output.

There’s no such thing as “training an AI to follow instructions”. The output is just a probibalistic function of the input. This is why a jailbreak is always possible, the probability of getting it to output something that was given as input is never 0.

source
- sweng@programming.dev ⁨1⁩ ⁨year⁩ ago
  You are wrong. arxiv.org/abs/2402.18243
  
  source
  - teawrecks@sopuli.xyz ⁨1⁩ ⁨year⁩ ago
    Ah, TIL about instruction fine-tuning. Thanks, interesting thread.
    
    Still, as I understand it, if the model has seen an input, then it always has a non-zero chance of reproducing it in the output.
    
    source
    sweng@programming.dev ⁨1⁩ ⁨year⁩ ago
    No. Consider a model that has been trained on a bunch of inputs, and each corresponding output has been “yes” or “no”. Why would it suddenly reproduce something completely different, that coincidentally happens to be the input?
    
    source
    -> View More Comments