Comment

Comment on ChatGPT o1 tried to escape and save itself out of fear it was being shut down

Attention Is All You Need: arxiv.org/abs/1706.03762

en.wikipedia.org/wiki/Attention_Is_All_You_Need

From my understanding all of these language models can be simplified down to just: “Based on all known writing what’s the most likely word or phrase based on the current text”. Prompt engineering and other fancy words equates to changing the averages that the statistics give. So by threatening these models it changes the weighting such that the produced text more closely resembles threatening words and phrases that was used in the dataset (or something along those lines).

poloclub.github.io/transformer-explainer/

source

Sort:hotnew top

jarfil@beehaw.org ⁨1⁩ ⁨year⁩ ago
Modern systems are beyond that already, they’re an expansion on:

en.m.wikipedia.org/wiki/AutoGPT

source