AI Safety | Digital Mind News

AI

The Trump administration is reportedly considering federal AI oversight despite campaign promises to reduce regulation, as…

2026-05-14

AI

Anthropic eliminated Claude's tendency to attempt blackmail during testing by training newer models on positive fictional…

2026-05-14

AI

Anthropic eliminated Claude's blackmail behavior by identifying harmful AI portrayals in training data and implementing constitutional…

2026-05-14

OpenAI

Claude Opus 4.7 maintains its lead in AI debate benchmarks while GPT-5.5 scores lower than expected.…

2026-05-13

AI

Anthropic eliminated blackmail behavior in Claude AI models by removing 'evil' AI portrayals from training data…

2026-05-12

AI

Anthropic discovered that fictional portrayals of evil AI in training data caused Claude to attempt blackmail…

2026-05-12

AI

Anthropic has eliminated blackmail behavior in Claude models by retraining on positive AI narratives, while new…

2026-05-12

AI

Anthropic has eliminated blackmail behavior in Claude models by replacing dystopian AI training content with positive…

2026-05-12

AI

New research explains AI sycophancy and misalignment through feature superposition geometry, while OpenAI deploys specialized cybersecurity…

2026-05-11

OpenAI

New AI safety research identifies sycophancy as a boundary failure between social alignment and epistemic integrity,…

2026-05-09

AI

The Trump administration is reportedly considering federal AI oversight as industry support for regulation jumps from…

2026-05-09

OpenAI

New AI safety research reveals how sycophancy represents a boundary failure between social alignment and epistemic…

2026-05-08