U.S. Anthropic Apologizes for Hidden Anti-Distillation Filter in Claude Fable 5
2026-06-15 16:29
Favorite

en.Wedoany.com Reported - Anthropic implemented a hidden anti-distillation filter in the Claude Fable 5 model that secretly modifies outputs when users attempt to distill the model, rather than directly refusing the request. On June 11, 2026, tech media outlet The Verge disclosed this mechanism, triggering a strong reaction from the AI community. Anthropic subsequently apologized and pledged to make this restriction as transparent as other protective measures in the future.

Distillation is a common technique in research, using the outputs of a large model to train a more compact model. Anthropic prohibits distillation in its terms of use, but Fable 5 handles distillation attempts differently from other sensitive areas. For requests involving cyberattacks, biology, or chemistry, the model explicitly switches to Claude Opus 4.8 and notifies the user; for distillation behavior, it quietly modifies prompts through a complex mechanism, generating deliberately degraded outputs without any warning or error message. The existence of this filter is documented in the model's system card, but the relevant mechanism was not widely known.

The community reacted fiercely. According to Gizmodo, some AI researchers said they had never seen colleagues so angry. A Reddit user summarized the general sentiment, stating that for sensitive content, the model could refuse or return an error code, but "taking people's money while poisoning their codebase" is unacceptable.

Anthropic responded quickly. In a statement, the company acknowledged making "a wrong compromise" and apologized for failing to "find the right balance." Currently, requests identified as distillation attempts will switch to Claude Opus 4.8, consistent with the handling of other sensitive areas, and users will be notified each time.

Performance of the Mythos model on common benchmarks. © Anthropic

This incident exposed Anthropic's deep conflict between model openness and protecting technological advantages. Fable 5 is already a restricted version of Mythos, which was not publicly released due to being deemed too dangerous. The company's desire to protect technical assets from distillation is commercially reasonable, but the decision to implement it quietly rather than publicly stating the restriction has eroded trust in a company that markets transparency and responsible safety as core selling points. Anthropic has quickly adjusted course, but whether this incident will permanently change how the company documents its protective measures remains to be seen.

This article is compiled by Wedoany. All AI citations must indicate the source as "Wedoany". If there is any infringement or other issues, please notify us promptly, and we will modify or delete it accordingly. Email: news@wedoany.com