AI weekly digest: The Model Too Dangerous to Ship

Anthropic revealed Claude Mythos Preview but won't ship it — too good at finding zero-days. Meta launched Muse Spark, Z.ai open-sourced a 744B coding model under MIT, and OpenAI added a $100 ChatGPT tier.

AI weekly digest: The Model Too Dangerous to Ship

Anthropic dropped a 244-page model card for a model nobody can use, Meta finally showed what billions in AI spending bought, and a Chinese lab open-sourced a 744B-parameter coding beast under MIT. This was a week where the frontier moved fast and the rules around it moved faster.

1. Anthropic Unveils Claude Mythos Preview, Withholds Public Release

Anthropic published a 244-page model card for Claude Mythos Preview — a step-change above Opus 4.6 — but won't release it commercially. The model autonomously discovered thousands of zero-day vulnerabilities across every major OS and browser, including a 27-year-old bug in OpenBSD and a Linux kernel exploit chain granting root access. On SWE-Bench Pro it jumps from 53.4% (Opus 4.6) to 77.8%; on CyberGym it hits 83.1% vs. Opus's 66.6%. Instead of shipping it, Anthropic launched Project Glasswing — a consortium with AWS, Apple, Google, Microsoft, CrowdStrike, Nvidia, and 40+ others — backed by $100M in model credits and $4M for open-source security orgs. The goal: patch critical software before Mythos-class capabilities become widely available.

Source: Anthropic — Project Glasswing

2. Meta Debuts Muse Spark from Its Superintelligence Labs

Meta shipped its first major model since bringing in Alexandr Wang as chief AI officer. Muse Spark is a multimodal reasoning model with tool use, visual chain-of-thought, and multi-agent orchestration. Benchmarks place it between Claude Sonnet 4.6 and Opus 4.6 — not frontier-leading, but a meaningful entry after a year of silence and billions in spending. API access is coming soon, with promises of open-source variants in the Muse family. The model will roll into Facebook, Instagram, WhatsApp, Messenger, and Ray-Ban Meta AI glasses.

Source: Meta AI Blog — Introducing Muse Spark

3. Z.ai Open-Sources GLM-5.1: 744B MoE Under MIT License

Z.ai (formerly Zhipu AI) released GLM-5.1 weights on Hugging Face under MIT — no restrictions on commercial use. The 744B-parameter mixture-of-experts model (40B active per token) tops SWE-Bench Pro at 58.4%, edging out Claude Opus 4.6 (57.3%). The real differentiator: it can sustain autonomous coding sessions for up to 8 hours, handling planning, execution, testing, and optimization loops. With a 200K-token context window and 131K max output, it's built for long-horizon agentic work. Setup guides already exist for Claude Code, OpenClaw, and Cline.

Source: Z.ai Blog — GLM-5.1

4. OpenAI Launches $100/Month ChatGPT Pro Plan

OpenAI filled the gap between the $20 Plus and $200 Pro tiers with a new $100/month plan aimed at developers and power users. The main draw: 5x more Codex usage than Plus (temporarily boosted to 10x until May 31), unlimited GPT-5.4 access, and access to GPT-5.4 Pro. This comes as Anthropic's ARR has reportedly surpassed $30B — OpenAI is clearly competing for the vibe-coder and professional developer segment that lives in Codex all day.

Source: TechCrunch — ChatGPT Pro Plan

5. Anthropic Ships Managed Agents and Advisor API

Two platform launches from Anthropic this week. Managed Agents lets developers deploy long-running agents through the developer console while Anthropic handles infrastructure — Notion is already using it to build a "delegate tasks to Claude" feature. Separately, the new Advisor tool lets API users pair Opus as a reasoning advisor alongside Sonnet or Haiku as cheaper executors, getting advanced reasoning without paying Opus rates for every token. Both signal Anthropic's push to own the agent infrastructure layer.

Source: Anthropic Engineering — Managed Agents

6. Google Gemma 4: Best Open Model Family Yet

Google released Gemma 4 under Apache 2.0 with native vision, audio, function calling, 256K context, and support for 140+ languages across four model sizes. Turing Post reports many OpenClaw users are switching to Gemma 4 for local inference. With Gemma 4 and GLM-5.1 both dropping in the same week, the open-weight ecosystem just got significantly more competitive with proprietary offerings.

Source: Turing Post — Gemma 4

7. Claude Cowork Goes Enterprise-Ready

Anthropic made Claude Cowork generally available for enterprise with role-based access controls, group spend limits, expanded observability, and integrations with tools like Zoom. Companies like Zapier and Airtree are already using it for project management and operational workflows. This positions Cowork as Anthropic's answer to enterprise-grade AI assistants that live outside the API.

Source: Claude Blog — Cowork for Enterprise

8. Cursor's Warp Decode: 1.8x MoE Inference Throughput

Cursor published "warp decode," a kernel design that reorganizes mixture-of-experts inference around output neurons instead of experts. On Blackwell GPUs, it achieves roughly 1.8x higher throughput with improved numerical accuracy. For anyone running MoE models in production (which is increasingly everyone, given GLM-5.1, Mixtral descendants, etc.), this is a meaningful infrastructure improvement worth tracking.

Source: Cursor Blog — Warp Decode

9. Perplexity Adds Full Personal Finance via Plaid

Perplexity moved beyond portfolio tracking to a comprehensive finance dashboard powered by Plaid. Users can now link checking, savings, credit cards, and loans to analyze spending, track liabilities, and calculate net worth. No trade execution — it's a read-only intelligence layer over your finances. Combined with the company's 50% revenue boost from its shift to AI agents, Perplexity is quickly becoming more than a search engine.

Source: TestingCatalog — Perplexity x Plaid

10. Alibaba Claims Viral "Happy Horse" Video AI Model

The anonymous video AI model that topped Artificial Analysis's text-to-video leaderboard this week turned out to be Alibaba's. "Happy Horse" generated a wave of excitement across China's AI community before Alibaba confirmed ownership. API access is expected soon. In a week dominated by LLM news, this is a reminder that generative video is advancing just as fast.

Source: TLDR AI