The original post is from r/Romania, but I thought it would be interesting to share this here too.
I thought that guardrails were implemented just through the initial prompt that would say something like “You are an AI assistant blah blah don’t say any of these things…” but by the sounds of it, DeepSeek has the guardrails literally trained into the net?
This must be the result of the reinforcement learning that they do. I haven’t read the paper yet, but I bet this extra reinforcement learning step was initially conceived to add these kind of censorship guardrails rather than making it “more inclined to use chain of thought” which is the way they’ve advertised it (at least in the articles I’ve read).
gerryflap@feddit.nl 4 weeks ago
Although censorship is obviously bad, I’m kinda intrigued by the way it’s yapping against itself. Trying to weigh the very important goal of providing useful information with its “programming” telling it not to upset Winnie the Pooh. It’s like a person mumbling “oh god oh fuck what do I do” to themselves when faced with a complex situation.
Kacarott@aussie.zone 4 weeks ago
I know right, while reading it I kept thinking “I can totally see how people might start to believe these models are sentient”, it was fascinating, the way it was “thinking”
TanyaJLaird@beehaw.org 4 weeks ago
It reminds me one of Asimov’s robots trying to reason a way around the Three Laws.