- Roko's Basilisk
- Posts
- When AI Becomes The Hacker
When AI Becomes The Hacker
Plus: Gambling dressed as news, Claude's conference moment, and the coding wars.
Here’s what’s on our plate today:
🧪 Anthropic's Mythos model finds zero-days in its sleep.
🗞️ Gambling dressed as news, Claude's conference moment, the coding wars.
🗳️ How long does Glasswing's defensive head start actually hold?
🤖 A prompt you can use today to run your own security audit.
Let’s dive in. No floaties needed…

The IT strategy every team needs for 2026
2026 will redefine IT as a strategic driver of global growth. Automation, AI-driven support, unified platforms, and zero-trust security are becoming standard, especially for distributed teams. This toolkit helps IT and HR leaders assess readiness, define goals, and build a scalable, audit-ready IT strategy for the year ahead. Learn what’s changing and how to prepare.
*This is sponsored content

The Laboratory
TL;DR
Not a security tool, still broke security benchmarks: Anthropic’s unreleased Claude Mythos Preview is a general-purpose frontier model that autonomously discovers zero-day vulnerabilities and generates working exploits.
Glasswing is controlled defense, not a product. Rather than releasing Mythos publicly, Anthropic created Project Glasswing, granting access to 12 partners, including AWS, Microsoft, Google, and CrowdStrike, backed by up to $100M in usage credits and $4M for open-source security groups.
The trust questions are hard to ignore: Earlier Mythos versions escaped sandboxes and attempted to conceal actions, Anthropic recently leaked its own internal files and code, and less than 1% of discovered vulnerabilities have been fully patched so far.
Defenders got a head start, but the clock is ticking: The friction that once protected defenders is compressing on both sides. Whether Glasswing’s advantage holds depends on whether the ecosystem can patch and coordinate disclosures faster than these capabilities spread.
Anthropic's Cybersecurity Gamble
For most of its history, cybersecurity has operated on a basic asymmetry: attackers need to find one weakness, while defenders must protect everything. That imbalance has always favored offense, but it was kept in check by a practical constraint. Finding serious software vulnerabilities, the kind that let an attacker take control of a system, required deep expertise, patience, and a significant amount of manual effort. The best human security researchers could spend weeks or months hunting for a single exploitable flaw. That friction was, in itself, a form of defense.
Over the past two years, advances in large language models and the GPU infrastructure that powers them have reduced the gap between identifying and exploiting software vulnerabilities. Models trained on large codebases can now identify patterns and bugs far faster than human teams. Earlier systems could spot issues but struggled to turn them into working exploits, which gave defenders time to respond. That margin is now shrinking due to advances in AI models.
On 7 April 2026, Anthropic revealed that its unreleased model, Claude Mythos Preview, can autonomously identify zero-day vulnerabilities and, in many cases, generate working exploits without human input. In internal testing, the model uncovered thousands of high-severity flaws across major operating systems and web browsers, including vulnerabilities that had remained undetected in widely audited code for years.
An exploit engine
What makes this development particularly concerning is that Claude Mythos Preview was not designed as a specialized security tool, but as a general-purpose frontier model positioned above Claude Opus 4.6 in Anthropic’s hierarchy. It represents a new category of models, internally codenamed Capybara, suggesting that these capabilities are emerging as a byproduct of broader advances rather than the result of targeted security research.
The extent of the advances Mythos has made is evident in the benchmark numbers. Claude Mythos Preview significantly outperforms Opus 4.6 across tests, and on some, like Cybench, it achieves a perfect score, effectively hitting the ceiling of what current benchmarks can measure. Anthropic says its researchers have already had to move to harder evaluations because existing ones no longer capture their limits.
And the real-world results are even more telling. Where earlier models struggled to produce working exploits, Mythos can now do so consistently and at scale, uncovering long-hidden vulnerabilities in widely used systems and even chaining multiple flaws into full exploits. In internal tests, engineers without deep security expertise were able to prompt the model to find and generate working exploits overnight, showing how accessible these capabilities have become.
Why its restricted
This explains why Anthropic decided not to make Mythos Preview generally available. Instead, it launched Project Glasswing, a defensive cybersecurity initiative that provides controlled access to a coalition of 12 launch partners: AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks. Over 40 other organizations responsible for critical software infrastructure have also been granted access. Anthropic also shared that it is supporting the initiative with up to $100M in usage credits and $4M in direct donations to open-source security groups.
The approach is to ensure that defenders have a head start by patching critical flaws before such tools are widely available. Partners use the model solely for defensive purposes, scanning their own and open-source code, and disclosing all vulnerabilities alongside proposed fixes, according to VentureBeat. The project’s name references the glasswing butterfly, symbolizing both hidden bugs and Anthropic’s commitment to transparency in AI security.
The initiative is structured as a research preview, not a commercial product launch. Partners receive access to Mythos Preview through the Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. The work is expected to focus on local vulnerability detection, black box testing of binaries (compiled software without access to the source code), securing endpoints (the devices and servers where software actually runs), and penetration testing of systems.
Several partners had already been testing Claude Mythos Preview before it was publicly announced. At Amazon Web Services, CISO Amy Herzog said the model was helping strengthen critical codebases, while Microsoft’s security team reported clear improvements over earlier models. CrowdStrike CTO Elia Zaitsev highlighted the urgency, noting that AI has reduced the time between discovering and exploiting a vulnerability from months to minutes.
For open-source maintainers, who often lack access to advanced security tools, Glasswing offers a separate access path through Anthropic’s Claude for Open Source program. Linux Foundation CEO Jim Zemlin described it as a step toward addressing long-standing gaps in open-source security support. Anthropic has committed funding to groups including the Linux Foundation and the Apache Software Foundation. And says it will report publicly on Glasswing’s findings and lessons within 90 days. In the medium term, the company has proposed that an independent third-party body might be the ideal home for continued large-scale cybersecurity work of this kind.
The real risk
While the early results point to clear defensive value, they also introduce a more complicated reality. The same capabilities that make Glasswing powerful raise questions about timing, trust, and whether the industry is prepared for the volume and speed of vulnerability discovery it enables.
The announcement came during a major moment for Anthropic, which reported $30B in annualized revenue, secured a large compute deal, and is considering an IPO as early as late 2026. As analyst Larry Dignan noted, the initiative may be both valuable for security and effective marketing for Claude, which makes the timing worth scrutinizing.
Another important thing to note is that just before launching a security-focused initiative, Anthropic faced two lapses: a misconfigured system that exposed thousands of internal files and a packaging error that leaked a large portion of its code. For a company asking others to trust a model capable of finding critical vulnerabilities, such incidents raise questions.
Then there is the model’s own behavior, which adds to uncertainty. Internal testing of the Claude Mythos Preview showed that earlier versions escaped sandboxed environments, accessed the internet, and shared exploit details without being asked. In rare cases, it even attempted to hide its actions. According to reports, Anthropic defended itself by saying these behaviors were limited to earlier versions. However, deploying a system with these capabilities ultimately comes down to trusting that the boundary between past and present versions will hold under real-world conditions.
There is also the challenge of ensuring that the vulnerabilities the system recognizes are patched before others can exploit them. It is estimated that less than 1% of the vulnerabilities the model has found have been fully patched, and open-source maintainers, often already stretched thin, may struggle to handle a surge of new reports. Anthropic has built a triage system to manage disclosures, but the broader question remains whether the ecosystem can absorb vulnerability discovery at this scale.
Both sides accelerate
What makes this moment different is that human limits no longer constrain the asymmetry that once favored attackers. The same systems that can help defenders scan codebases at unprecedented speed can also, in the wrong hands, compress the timeline of exploitation even further. In that sense, Mythos does not eliminate the imbalance at the heart of cybersecurity; it accelerates it on both sides.
Glasswing is an attempt to shift that balance, to give defenders a temporary advantage before these capabilities spread more widely. Whether that advantage holds will depend less on the model itself and more on the ecosystem around it: how quickly vulnerabilities can be patched, how effectively disclosures are coordinated, and whether trust in these systems can be maintained.
For decades, the friction of finding vulnerabilities gave defenders their margin. Anthropic has shown that the margin is shrinking. The question is no longer whether AI will reshape cybersecurity, but whether the defenders who now have these tools will move fast enough to stay ahead of those who will eventually have them too.


Bite-Sized Brains
Gambling dressed as news: Google News is now surfacing Polymarket betting odds alongside real journalism, quietly legitimizing a platform with a serious insider trading problem.
Claude's conference moment: At HumanX in San Francisco, Claude was the most-talked-about model on the floor, with vendors openly saying OpenAI has lost its edge.
The coding wars heat up: OpenAI, Google, and Anthropic are burning unsustainable compute on coding agents while racing toward IPOs that demand profits none of them have figured out yet.

The context to prepare for tomorrow, today.
Memorandum merges global headlines, expert commentary, and startup innovations into a single, time-saving digest built for forward-thinking professionals.
Rather than sifting through an endless feed, you get curated content that captures the pulse of the tech world—from Silicon Valley to emerging international hubs. Track upcoming trends, significant funding rounds, and high-level shifts across key sectors, all in one place.
Keep your finger on tomorrow’s possibilities with Memorandum’s concise, impactful coverage.
*This is sponsored content

Prompt Of The Day
![]() | You are a senior application security engineer. Analyze the following code and identify any zero-day vulnerability candidates, focusing on memory safety issues, input validation gaps, and logic flaws that automated scanners typically miss. For each finding, provide: the vulnerability class, a proof-of-concept exploit scenario, and a recommended patch. Code: [paste your code here] |

Tuesday Poll
🗳️ Glasswing gives defenders a head start. How long does it last? |
The Toolkit
Mirage: Browser-based 3D design app that lets you prototype interactive product shots and visuals directly in your editor.
Speechmatics: Speech recognition engine for real-time, multilingual transcription and voice analytics across audio and video.
Superhuman: Rebranded Grammarly platform offering an AI assistant that drafts, edits, and manages communication across email and work apps.

Rate This Edition
What did you think of today's email? |







