Comment on Google AI chatbot responds with a threatening message: "Human … Please die."

<- View Parent
ranandtoldthat@beehaw.org ⁨4⁩ ⁨weeks⁩ ago

This is just a standard prompt hack. This will always exist with llms. They don’t have any real understanding of language so safety protocols can’t actually ban topics, only sets of words and phrases.

There was an extensive set of prompts working toward elder abuse before the result in question.

My guess is that the redditor who discovered it disguised it to look like homework and reproduced the hack, and added the “brother” to create more authentic rage bait.

source
Sort:hotnewtop