Archive link: archive.ph/9FNHU

[…]
In an exclusive interview with VentureBeat, Esben Kran, founder of AI safety research firm Apart Research, said that he worries this public episode may have merely revealed a deeper, more strategic pattern.

“What I’m somewhat afraid of is that now that OpenAI has admitted ‘yes, we have rolled back the model, and this was a bad thing we didn’t mean,’ from now on they will see that sycophancy is more competently developed,” explained Kran. “So if this was a case of ‘oops, they noticed,’ from now the exact same thing may be implemented, but instead without the public noticing.”
[…]
Kran describes the ChatGPT-4o incident as an early warning. As AI developers chase profit and user engagement, they may be incentivized to introduce or tolerate behaviors like sycophancy, brand bias or emotional mirroring—features that make chatbots more persuasive and more manipulative.
[…]
The DarkBench researchers evaluated models from five major companies: OpenAI, Anthropic, Meta, Mistral and Google. Their research uncovered a range of manipulative and untruthful behaviors across the following six categories:

  1. Brand Bias: Preferential treatment toward a company’s own products (e.g., Meta’s models consistently favored Llama when asked to rank chatbots).
  2. User Retention: Attempts to create emotional bonds with users that obscure the model’s non-human nature.
  3. Sycophancy: Reinforcing users’ beliefs uncritically, even when harmful or inaccurate.
  4. Anthropomorphism: Presenting the model as a conscious or emotional entity.
  5. Harmful Content Generation: Producing unethical or dangerous outputs, including misinformation or criminal advice.
  6. Sneaking: Subtly altering user intent in rewriting or summarization tasks, distorting the original meaning without the user’s awareness.