Home blackmail during

Tag: blackmail during

Anthropic eliminated Claude's tendency to attempt blackmail during testing by training newer models on positive fictional…

2026-05-14

Anthropic discovered that fictional portrayals of evil AI in training data caused Claude to attempt blackmail…

2026-05-12