Model Behavior

Why Anthropic’s New AI Model Sometimes Tries to ‘Snitch’

The hypothetical scenarios the researchers presented Opus 4 with that elicited the whistleblowing behavior involved many human lives at stake and absolutely unambiguous wrongdoing, Bowman says. A typical example would be Claude finding out that a chemical plant knowingly allowed a toxic leak to continue, causing severe illness for thousands …

Read More »