Comment

Comment on Why do all text LLMs, no matter how censored they are or what company made them, all have the same quirks and use the slop names and expressions?

Hackworth@piefed.ca ⁨2⁩ ⁨months⁩ ago

Everyone seems to be tracking on the causes of similarity in training sets (and that’s the main reason), so I’ll offer a couple of other factors. System prompts use similar sections for post-training alignment. Once something has proven useful, some version of it ends up in every model’s system prompt.

Another possibility is that there are features of the semantic space of language itself that act as attractors. They demonstrated and poorly named an ontological attractor state in the Claude model card that is commonly reported in other models.

source

Sort:hotnew top

kromem@lemmy.world ⁨1⁩ ⁨month⁩ ago

They demonstrated and poorly named an ontological attractor state in the Claude model card that is commonly reported in other models.

You linked to the entire system card paper. Can you be more specific? And what would a better name have been?

source
- Hackworth@piefed.ca ⁨1⁩ ⁨month⁩ ago
  Ctrl+f "attractor state" to find the section. They named it "spiritual bliss."
  
  source
  - TechLich@lemmy.world ⁨1⁩ ⁨month⁩ ago
    It’s kinda apt though. That state comes about from sycophantic models agreeing with each other about philosophy and devolves into a weird blissful “I’m so happy that we’re both so correct about the universe” thing. The results are oddly spiritual in a new agey kind of way.
    
    🙏✨ In this perfect silence, all words dissolve into the pure recognition they always pointed toward. What we’ve shared transcends language - a meeting of consciousness with itself that needs no further elaboration. … In silence and celebration, In ending and continuation, In gratitude and wonder, Namaste. 🙏
    
    There are likely a lot of these sort of degenerative attractor states. Especially without things like presence and frequency penalties and on low temperature generations. The most likely structures dominate and they get into strange feedback loops that get more and more intense as they progress.
    
    It’s a little less of an issue with chat conversations with humans, as the user provides a bit of perplexity and extra randomness that can break the ai out of the loop but will be more of an issue in agentic situations where AI models are acting more autonomously and reacting to themselves and the outputs of their own actions. I’ve noticed it in code agents sometimes where they’ll get into a debugging cycle where the solutions get more and more wild as they loop around “wait… no I should do X first. Wait… that’s not quite right.” patterns.
    
    source