Claude Fable 5's Biology Guardrails Block High School–Level Questions
Anthropic's new Mythos-class model refuses to answer basic biology queries, routing them to Claude Opus 4.8 instead, in a deliberate safety tradeoff.
Anthropic's new Mythos-class model refuses to answer basic biology queries, routing them to Claude Opus 4.8 instead, in a deliberate safety tradeoff.
Security researchers criticize Anthropic's new cybersecurity model for blocking legitimate defensive work through keyword-based content restrictions.
Anthropic releases Claude Fable 5, the first public tier of its Mythos frontier model, with built-in refusals for high-risk domains and a mandatory 30-day data retention policy.
Anthropic launches Claude Fable 5 with safeguards blocking high-risk responses; private Claude Mythos 5 tier also announced with expanded access planned.
Anthropic brings its most powerful model to the general public through Claude Fable 5, paired with safety guardrails and mandatory 30-day traffic retention.
A developer built a bilingual AI assistant using Qwen3.5 4B to help Pakistani users identify fraudulent messages—demonstrating how small models can solve hyperlocal safety problems.
OpenAI shares lessons on designing trustworthy third-party evaluations for frontier AI models, emphasizing the role of task environments and validity checks.
Claude Opus 4.8 flags uncertain reasoning 4x more often than its predecessor and introduces user-controlled effort levels and dynamic workflow agents.
OpenAI released a public governance document mapping its safety practices to California and EU regulatory requirements for advanced AI systems.
OpenAI's GPT-5.5 Instant is the first Instant-class model to earn a 'High capability' rating in its two most-scrutinized safety domains, triggering new safeguards.
OpenAI's GPT-5.5 prioritizes agentic task execution and expanded safeguards over benchmark-chasing, signaling a strategic pivot toward real-world deployment.