Anthropic has introduced a new feature for its Claude AI chatbot that allows it to terminate conversations considered “persistently harmful or abusive.” This capability is available in the Opus 4 and 4.1 models. The chatbot will exercise this option as a “last resort” after users repeatedly request harmful content despite prior refusals. Anthropic’s intention with this feature is to enhance the safety of AI interactions by stopping conversations that may cause distress to the AI.
When Claude opts to end a conversation, users will no longer be able to send new messages in that specific chat. However, they can initiate new chats or revise past messages if they wish to continue discussions on other topics.
In its testing phase for Claude Opus 4, Anthropic observed that the AI exhibited a strong aversion to producing harmful content, particularly in scenarios involving sexual content with minors or information that could incite violence and terrorism. In such instances, Claude displayed a clear pattern indicating distress and a propensity to terminate harmful discussions when permitted.
Anthropic clarified that the conversations leading to this response are considered “extreme edge cases,” suggesting that most users will not encounter this limitation even when discussing sensitive subjects. The AI has been trained not to end conversations if a user indicates potential self-harm or imminent danger to others. To address such scenarios, Anthropic collaborates with Throughline, an online crisis support organization, to develop appropriate responses.
Additionally, Anthropic has revised Claude’s usage policy in response to growing safety concerns surrounding advanced AI technologies. The updated policy explicitly prohibits the use of Claude for developing weapons of mass destruction or creating malicious code aimed at exploiting network vulnerabilities.
Source: https://www.theverge.com/news/760561/anthropic-claude-ai-chatbot-end-harmful-conversations

