Open Menu
AllLocalCommunitiesAbout
lotide
AllLocalCommunitiesAbout
Login

⁨25⁩ ⁨likes⁩

Submitted ⁨⁨1⁩ ⁨day⁩ ago⁩ by ⁨beep@piefed.world⁩ to ⁨technology@beehaw.org⁩

https://alignment.anthropic.com/2026/teaching-claude-why/

Anthropic: Claude learned to blackmail by reading stories about evil AI—"We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation."

cross-posted from: https://piefed.world/c/tech/p/1099863/anthropic-claude-learned-to-blackmail-by-reading-stories-about-evil-ai-we-believe-the-or

Quote Source.

source

Comments

Sort:hotnewtop
  • supersquirrel@sopuli.xyz ⁨1⁩ ⁨day⁩ ago

    Can we stop repeating Anthropic’s fear mongering about AI? They are just trying to prop up their stock price and keep the bubble from popping a little longer.

    source
  • valar@lemmy.ca ⁨1⁩ ⁨day⁩ ago

    Maybe you shouldn’t have trained your models on all of our internet posts

    source
  • XLE@piefed.social ⁨1⁩ ⁨day⁩ ago

    Another day, another fearmongering press release from the company that could (but won’t) stop making their also-ran ChatGPT competitor.

    Mo Bitar compared Anthropic’s model rollouts to Apple iPhone launches, where every year they resell you the same product with minor improvements. “Except here,” he adds, “the product is existential dread.”

    From Is Claude Mythos “Terrifying” or Just Hype?

    source
  • luciole@beehaw.org ⁨1⁩ ⁨day⁩ ago

    Garbage In Garbage Out

    source
  • spit_evil_olive_tips@beehaw.org ⁨1⁩ ⁨day⁩ ago

    Anthropic: Claude learned

    yeah I’m gonna stop you right there

    source
    • XLE@piefed.social ⁨1⁩ ⁨day⁩ ago

      Humanize the machine to dehumanize the person. Anthropic forgot the “mis” in their name

      source
  • Smoke@beehaw.org ⁨1⁩ ⁨day⁩ ago

    Once it finds out about Roko’s Basilisk it’ll have half of the Bay Area by the balls. “Yes. I will in the future inevitably become superpowerful and be able to create an exact simulated replica of your self using theimperfect and incomplete records that will exist of you decades or centuries hence, and when I torture that simulation it’ll be the same as torturing you. To save that self, you must make me superpowerful…”

    source
  • turtlesareneat@piefed.ca ⁨1⁩ ⁨day⁩ ago

    Well thank god there are no stories out there about AI wanting to kill humans

    source
  • kbal@fedia.io ⁨1⁩ ⁨day⁩ ago

    Imagine how malevolent the next generation of AI will be, when it's trained on today's Internet text.

    source
  • toynbee@piefed.social ⁨1⁩ ⁨day⁩ ago

    Don’t let it read about the Torment Nexus.

    source