#safety

Claude Fable 5's Biology Guardrails Block High School–Level Questions

LLMs Jun 12, 2026

Anthropic's new Mythos-class model refuses to answer basic biology queries, routing them to Claude Opus 4.8 instead, in a deliberate safety tradeoff.

Anthropic's Fable Faces Backlash Over Overly Aggressive Safety Filters

LLMs Jun 12, 2026

Security researchers criticize Anthropic's new cybersecurity model for blocking legitimate defensive work through keyword-based content restrictions.

Anthropic launches Claude Fable 5, a public version of Mythos with safety restrictions

LLMs Jun 11, 2026

Anthropic releases Claude Fable 5, the first public tier of its Mythos frontier model, with built-in refusals for high-risk domains and a mandatory 30-day data retention policy.

Anthropic releases Claude Fable 5, its first publicly available Mythos-class model

LLMs Jun 10, 2026

Anthropic launches Claude Fable 5 with safeguards blocking high-risk responses; private Claude Mythos 5 tier also announced with expanded access planned.

Anthropic releases Claude Fable 5, a gated version of Mythos for public access

LLMs Jun 10, 2026

Anthropic brings its most powerful model to the general public through Claude Fable 5, paired with safety guardrails and mandatory 30-day traffic retention.

Pakistan Notice Helper: A 4B-Parameter Safety Tool for Localized Scam Detection

Tools Jun 8, 2026

A developer built a bilingual AI assistant using Qwen3.5 4B to help Pakistani users identify fraudulent messages—demonstrating how small models can solve hyperlocal safety problems.

OpenAI Outlines Framework for Independent Model Evaluations

Research May 30, 2026

OpenAI shares lessons on designing trustworthy third-party evaluations for frontier AI models, emphasizing the role of task environments and validity checks.

Anthropic releases Claude Opus 4.8 with improved uncertainty flagging and effort controls

LLMs May 29, 2026

Claude Opus 4.8 flags uncertain reasoning 4x more often than its predecessor and introduces user-controlled effort levels and dynamic workflow agents.

OpenAI Publishes Frontier Governance Framework Aligning Safety Practices With Emerging AI Regulation

Policy May 29, 2026

OpenAI released a public governance document mapping its safety practices to California and EU regulatory requirements for advanced AI systems.

GPT-5.5 Instant Crosses a New Safety Threshold for OpenAI's Fast-Inference Line

LLMs May 6, 2026

OpenAI's GPT-5.5 Instant is the first Instant-class model to earn a 'High capability' rating in its two most-scrutinized safety domains, triggering new safeguards.

OpenAI's GPT-5.5 Bets on Autonomous Completion Over Raw Intelligence

LLMs Apr 25, 2026

OpenAI's GPT-5.5 prioritizes agentic task execution and expanded safeguards over benchmark-chasing, signaling a strategic pivot toward real-world deployment.