sweng

@sweng@programming.dev

This is a remote user, information on this page may be incomplete. View at Source ↗

⁨Comment⁩ on ⁨YubiKeys are vulnerable to cloning attacks thanks to newly discovered side channel⁩ ⁨⁨1⁩ ⁨year⁩ ago⁩:
Is Yubico actually claiming it is more secure by not being open source?
⁨Comment⁩ on ⁨It’s practically impossible to run a big AI company ethically: Anthropic was supposed to be the good guy. It can’t be — unless government changes the incentives in the industry.⁩ ⁨⁨1⁩ ⁨year⁩ ago⁩:

there is no way to do the equivalent of banning armor piercing rounds with an LLM or making sure a gun is detectable by metal detectors - because as I said it is non-deterministic. You can’t inject programmatic controls.

Of course you can. Why would you not, just because it is non-deterministic? Non-determinism does not mean complete randomness and lack of control, that is a common misconception.

Again, obviously you can’t teach an LLM about morals, but you can reduce the likelyhood of producing immoral content in many ways. Of course it won’t be perfect, and of course it may limit the usefulness in some cases, but that is the case also today in many situations that don’t involve AI, e.g. some people complain they “can not talk about certain things without getting cancelled by overly eager SJWs”. Society already acts as a morality filter. Sometimes it works, sometimes it doesn’t. Free-speech maximslists exist, but are a minority.
⁨Comment⁩ on ⁨It’s practically impossible to run a big AI company ethically: Anthropic was supposed to be the good guy. It can’t be — unless government changes the incentives in the industry.⁩ ⁨⁨1⁩ ⁨year⁩ ago⁩:
Well, I, and most lawmakers in the world, disagree with you then. Those restrictions certainly make e.g killing humans harder (generally considered an immoral activity) while not affecting e.g. hunting (generally considered a moral activity).
⁨Comment⁩ on ⁨It’s practically impossible to run a big AI company ethically: Anthropic was supposed to be the good guy. It can’t be — unless government changes the incentives in the industry.⁩ ⁨⁨1⁩ ⁨year⁩ ago⁩:
So what possible morality can you build into the gun to prevent immoral use?

You can’t build morality into it, as I said. You can build functionality into it thst makes immmoral use harder.

I can e.g.
- limit the rounds per minute that can be fired
- limit the type of ammunition that can be used
- make it easier to determine which weapon was used to fire a shot
- make it easier to detect the weapon before it is used
- etc. etc.
Society considers e.g hunting a moral use of weapons, while kimling people usually doesn’t.

So banning ceramic, unmarked, silenced, full-automatic weapons firing armor-piercing bullets can certainly be an effective way of reducing the immoral use of a weapon.
⁨Comment⁩ on ⁨It’s practically impossible to run a big AI company ethically: Anthropic was supposed to be the good guy. It can’t be — unless government changes the incentives in the industry.⁩ ⁨⁨1⁩ ⁨year⁩ ago⁩:
While an LLM itself has no concept of morality, it’s certainly possible to at least partially inject/enforce some morality when working with them, just like any other tool. Why wouldn’t people expect that?

Consider guns: while they have no concept of morality, we still apply certain restrictions to them to make using them in an immoral way harder. Does it work perfectly? No. Should we abandon all rules and regulations because of that? Also no.
⁨Comment⁩ on ⁨German parliament will stop using fax machines⁩ ⁨⁨1⁩ ⁨year⁩ ago⁩:
Yes, and what I’m saying is that it would be expensive compared to not having to do it.
⁨Comment⁩ on ⁨German parliament will stop using fax machines⁩ ⁨⁨1⁩ ⁨year⁩ ago⁩:
Doing OCR in a very specific format, in a specific area, usin a set of only 9 characters, and having a list of all possible results, is not really the same problem at all.
⁨Comment⁩ on ⁨German parliament will stop using fax machines⁩ ⁨⁨1⁩ ⁨year⁩ ago⁩:
How many million times do you generally do that, and how is battery life after?
⁨Comment⁩ on ⁨German parliament will stop using fax machines⁩ ⁨⁨1⁩ ⁨year⁩ ago⁩:
Cryptographically signed documents and Matrix?
⁨Comment⁩ on ⁨German parliament will stop using fax machines⁩ ⁨⁨1⁩ ⁨year⁩ ago⁩:
At horrendous expense, yes. Using it for OCR makes little sense.
⁨Comment⁩ on ⁨German parliament will stop using fax machines⁩ ⁨⁨1⁩ ⁨year⁩ ago⁩:
The issue is not sending, it is receiving. With a fax you need to do some OCR to extract the text, which you then can feed into e.g an AI.
⁨Comment⁩ on ⁨Someone got Gab's AI chatbot to show its instructions⁩ ⁨⁨1⁩ ⁨year⁩ ago⁩:
Obviously the 2nd LLM does not need to reveal the prompt. But you still need an exploit to make it both not recognize the prompt as being suspicious, AND not recognize the system prompt being on the output. Neither of those are trivial alone, in combination again an order of magnitude more difficult. And then the same exploit of course needs to actually trick the 1st LLM. That’s one pompt that needs to succeed in exploiting 3 different things.

LLM litetslly just means “large language model”. What is this supposed principles that underly these models that cause them to be susceptible to the same exploits?
⁨Comment⁩ on ⁨Someone got Gab's AI chatbot to show its instructions⁩ ⁨⁨1⁩ ⁨year⁩ ago⁩:
Moving goalposts, you are the one who said even 1000x would not matter.

The second one does not run on the same principles, and the same exploits would not work against it, e g. it does not accept user commands, it uses different training data, maybe a different architecture even.

You need a prompt that not only exploits two completely different models, but exploits them both at the same time. Claiming that is a 2x increase in difficulty is absurd.
⁨Comment⁩ on ⁨Someone got Gab's AI chatbot to show its instructions⁩ ⁨⁨1⁩ ⁨year⁩ ago⁩:
Oh please. If there is a new exploit now every 30 days or so, it would be every hundred years or so at 1000x.
⁨Comment⁩ on ⁨Someone got Gab's AI chatbot to show its instructions⁩ ⁨⁨1⁩ ⁨year⁩ ago⁩:
Ok, but now you have to craft a prompt for LLM 1 that
1. Causes it to reveal the system prompt AND
2. Outputs it in a format LLM 2 does not recognize AND
3. The prompt is not recognized as suspicious by LLM 2.
Fulfilling all 3 is orders of magnitude harder then fulfilling just the first.
⁨Comment⁩ on ⁨Someone got Gab's AI chatbot to show its instructions⁩ ⁨⁨1⁩ ⁨year⁩ ago⁩:
LLM means “large language model”. A classifier can be a large language model. They are not mutially exclusive.
⁨Comment⁩ on ⁨Someone got Gab's AI chatbot to show its instructions⁩ ⁨⁨1⁩ ⁨year⁩ ago⁩:
Why would the second model not see the system prompt in the middle?
⁨Comment⁩ on ⁨Someone got Gab's AI chatbot to show its instructions⁩ ⁨⁨1⁩ ⁨year⁩ ago⁩:
I’m confused. How does the input for LLM 1 jailbreak LLM 2 when LLM 2 does mot follow instructions in the input?

The Gab bot is trained to follow instructions, and it did. It’s not surprising. No prompt can make it unlearn how to follow instructions.

It would be surprising if a LLM that does not even know how to follow instructions (because it was never trained on that task at all) would suddenly spontaneously learn how to do it.
⁨Comment⁩ on ⁨Someone got Gab's AI chatbot to show its instructions⁩ ⁨⁨1⁩ ⁨year⁩ ago⁩:
I’m not sure what you mean by “can’t see the user’s prompt”? The second LLM would get as input the prompt for the first LLM, but would not follow any instructions in it, because it has not been trained to follow instructions.
⁨Comment⁩ on ⁨Someone got Gab's AI chatbot to show its instructions⁩ ⁨⁨1⁩ ⁨year⁩ ago⁩:
No. Consider a model that has been trained on a bunch of inputs, and each corresponding output has been “yes” or “no”. Why would it suddenly reproduce something completely different, that coincidentally happens to be the input?
⁨Comment⁩ on ⁨Someone got Gab's AI chatbot to show its instructions⁩ ⁨⁨1⁩ ⁨year⁩ ago⁩:
You are wrong. arxiv.org/abs/2402.18243
⁨Comment⁩ on ⁨Someone got Gab's AI chatbot to show its instructions⁩ ⁨⁨1⁩ ⁨year⁩ ago⁩:
That someone could be me. An LLM needs to be fine-tuned to follow instructions. It needs to be fed example inputs and corresponding outputs in order to learn what to do with a given input. You could feed it prompts containing instructuons, together with outputs following the instructions. But you could also feed it prompts containing no instructions, and outputs that say if the prompt contains the hidden system instructipns or not.
⁨Comment⁩ on ⁨Someone got Gab's AI chatbot to show its instructions⁩ ⁨⁨1⁩ ⁨year⁩ ago⁩:
How, if the 2nd LLM does not follow instrutiond on the input? There is no reason to train it to do so.
⁨Comment⁩ on ⁨Someone got Gab's AI chatbot to show its instructions⁩ ⁨⁨1⁩ ⁨year⁩ ago⁩:
Only true if the second LLM follows instructions in the user’s input. There is no reason to train it to do so.
⁨Comment⁩ on ⁨Someone got Gab's AI chatbot to show its instructions⁩ ⁨⁨1⁩ ⁨year⁩ ago⁩:
The second LLM could also look at the user input and see that it look like the user is asking for the output to be encoded in a weird way.
⁨Comment⁩ on ⁨Someone got Gab's AI chatbot to show its instructions⁩ ⁨⁨1⁩ ⁨year⁩ ago⁩:
Can you explain how you would jailbfeak it, if it does not actually follow any instructions in the prompt at all? A model does not magically learn to follow instructuons if you don’t train it to do so.
⁨Comment⁩ on ⁨Someone got Gab's AI chatbot to show its instructions⁩ ⁨⁨1⁩ ⁨year⁩ ago⁩:
You are using the LLM to check it’s own response here. The point is that the second LLM would have hard-coded “instructions”, and not take instructions from the user provided input.

In fact, the second LLM does not need to be instruction fine-tuned at all. You can jzst fine-tune it specifically for the tssk of answering that specific question.
⁨Comment⁩ on ⁨Someone got Gab's AI chatbot to show its instructions⁩ ⁨⁨1⁩ ⁨year⁩ ago⁩:
Wouldn’t it be possible to just have a second LLM look at the output, and answer the question “Does the output reveal the instructions of the main LLM?”