Comment on [deleted]

<- View Parent
BCOVertigo@lemmy.world ⁨1⁩ ⁨day⁩ ago

Genuine question, how confident are we that an LLM can actually be patched like a deterministic system through prompt and weight manipulation? Has the 95% adversarial success rate that was reported actually moved in the past year? I don’t feel like any meaningful progress has been made but I’m admittedly biased so I know I’m not looking in the places that would report success if there was any.

source
Sort:hotnewtop