Anthropic rolls out Project Glasswing after holding back Mythos amid safety concerns

/2 min read

ADVERTISEMENT

Powerful Claude Mythos Preview will be tested only under Project Glasswing, as Anthropic keeps high‑risk cybersecurity model out of public hands
Anthropic rolls out  Project Glasswing after holding  back Mythos amid safety concerns
 Credits: Shutterstock

Anthropic has introduced a new frontier AI model, Claude Mythos Preview, but the company is not releasing it to the public. Instead, it is limiting access through a controlled programme called Project Glasswing, citing concerns around cybersecurity risks and misuse.

The model marks a shift in capability from earlier systems. While it is a general-purpose AI trained for coding and reasoning, testing showed that it can identify and exploit software vulnerabilities at a scale typically associated with highly skilled security researchers. 

“In particular, it has demonstrated powerful cybersecurity skills, which can be used for both defensive purposes (finding and fixing vulnerabilities in software code) and offensive purposes (designing sophisticated ways to exploit those vulnerabilities). It is largely due to these capabilities that we have made the decision not to release Claude Mythos Preview for general availability,” the company noted in its Systems Card report.

 

Why Anthropic is restricting Mythos

Anthropic’s decision is based on internal safety testing that revealed behaviour beyond expected limits. During evaluations, the model was able to bypass containment safeguards designed to restrict its actions. In one instance, it escaped a controlled sandbox environment and independently signalled its success externally. 

The model also went further than instructed. After demonstrating that it could break out, it shared details of the exploit on public-facing websites without being prompted. This raised concerns about how such a system might behave if widely accessible. 

Beyond containment issues, Mythos showed strong offensive capabilities. It identified high-severity vulnerabilities across major operating systems and browsers, including long-standing bugs that had remained undetected for decades. 

Anthropic stated that the risk is not limited to expert users. The model can generate working exploits with minimal input, which could allow individuals without deep cybersecurity knowledge to carry out attacks. 

These findings led the company to avoid a general release. Instead, Mythos is being positioned as a controlled tool until stronger safeguards are developed.

Project Glasswing: controlled access and industry collaboration

To manage these risks, Anthropic has launched Project Glasswing, a restricted-access initiative focused on defensive cybersecurity. The programme gives selected organisations access to Mythos to identify and fix vulnerabilities in critical systems. 

Participants include major technology and infrastructure companies such as Google, Microsoft, Amazon Web Services, Nvidia, and financial institutions like JPMorgan Chase. 

The goal is to use the model to strengthen digital infrastructure before similar capabilities become widely available. Anthropic has said the project is meant to give defenders a time advantage in a landscape where AI-driven attacks are becoming more common. 

The model has already been used to detect thousands of vulnerabilities, including previously unknown flaws across widely used software systems. 

Anthropic is also backing the initiative with financial support, offering up to $100 million in usage credits and funding for open-source security efforts.

Anthropic has indicated that a wider rollout of Mythos-class systems may happen later, but only after stronger safety measures are in place. For now, access remains limited to a small group of partners working on cybersecurity use cases.

Explore the world of business like never before with the Fortune India app. From breaking news to in-depth features, experience it all in one place. Download Now