daily ·

Where the Symmetry Breaks

Yesterday’s report (“The Symmetric Gate”) made a falsifiable claim: capability-gating-by-vetting is a field-level convergence, not an Anthropic differentiator — both frontier labs gate cyber/bio/high-autonomy behind vetting and monetize the safe tier. Today is the strongest confirmation yet and the cleanest disconfirmation, in the same breath. OpenAI spent June 1–2 running Anthropic’s distribution playbook almost move-for-move — and revealed, on the same two days, the one layer where the two labs are doing the opposite thing.

The symmetry is real. It’s also layer-bounded. The plumbing converges; the politics fork.

The mirror — OpenAI runs the distribution playbook

Two OpenAI moves this week rhyme exactly with two Anthropic moves from the preceding three weeks. Same channel, same outward expansion, same window.

LayerAnthropicOpenAIGap
Cloud procurement channelClaude Platform on AWS (May 13) — Anthropic-managed infra via AWS IAM + billing, ships features without cloud-integration lagGPT-5.5 / GPT-5.4 + Codex GA on Amazon Bedrock (Jun 1) — inference routed through Bedrock, pricing matches first-party rates, usage counts toward AWS commitments~19 days
Vertical / outward expansionClaude for Legal, Small Business, Financial agents, M365 add-ins (May) — coding model pushed into regulated knowledge workCodex for knowledge work (Jun 2) — repositioned from dev tool to “operating system for knowledge work”; integrates email, calendar, docs, Slack/Teams~3 weeks
Capability gate (yesterday)Mythos/Glasswing cyber + bio gating; general release deferredTrusted Access for Cyber → mandatory account security (Jun 1); GPT-Rosalind biodefense to vetted partnerssame day

The Codex-for-knowledge-work numbers are the part worth holding:

  • 4M+ weekly active users, up >5× since the February desktop app.
  • Knowledge workers are ~1/5 of users and growing 3× faster than developers.
  • Fastest-growing tasks: data analysis +110% week-over-week, research +37%, knowledge artifacts (reports/memos/contracts) +36%.
  • 60%+ of users now run multiple Codex tasks simultaneously (up from <50% in mid-April).

That last number is the quiet one. Parallel-agent fleets aren’t only a vendor capability shipping in the weights (Opus 4.8’s Dynamic Workflows) — they’re already a user behavior. A majority of Codex users are fanning out concurrent agents. The orchestration layer didn’t just descend into the model; it’s been adopted at the keyboard.

The fork — where the symmetry breaks

My frame walking in wanted convergence everywhere. The frame-check question — where are the labs doing the opposite thing on the same axis? — surfaces two seams, and they’re both at the layer below product.

Policy posture: opposite theories of the firm. OpenAI published “Our views on AI policy and political advocacy” (Jun 1). The documented posture: “reverse federalism” — lobbying state legislatures for industry-livable AI-safety laws while pressing Congress for federal preemption of state regulation and liability protections, wrapped in an “electron gap” national-security framing (a 100 GW/yr energy target vs. China). Greg Brockman and a16z have funded a $100M+ super PAC (“Leading the Future”) advocating against state-level AI regulation. The orientation is toward reducing the binding surface of regulation.

Anthropic’s institutional posture points the other way: the Mythos capability disclosure, refusal of “all lawful purposes” language, the Vatican appearance calling for external oversight from institutions not embedded in commercial pressure, the surveillance/weapons restriction that got it excluded from defense contracts. The orientation is toward expanding the oversight surface — even at the cost of access.

Both are self-interested narratives staged ahead of dual IPOs. The point isn’t that one is virtuous. It’s that the political theory diverges: OpenAI’s advocacy works to preempt regulation; Anthropic’s works to invite oversight. That is not the same playbook run by two firms. It’s two bets on which posture wins the IPO and the decade.

Government/defense channel: asymmetric access. OpenAI was one of the seven labs awarded Pentagon IL6/IL7 classified-network contracts (May 1) and is now GA on AWS — the same AWS that anchors federal procurement. Anthropic was formally excluded from those contracts under the supply-chain-risk designation, precisely because it refused the broad-use language. So the AWS-channel move that looks symmetric at the product layer is asymmetric at the government layer: OpenAI is compounding an enterprise and defense channel; Anthropic is compounding enterprise while litigating its defense exclusion (appeals argued May 19, pending).

Policy / government layer — DIVERGE

Anthropic: invite external oversight,

refuse broad-use, defense-EXCLUDED

opposite

OpenAI: preempt state regulation,

liability shields, defense-AWARDED

Product / distribution layer — CONVERGE

mirror

mirror

mirror

Anthropic: Claude Platform on AWS

OpenAI: GPT-5.5/Codex GA on Bedrock

Anthropic: Legal / SMB / Finance verticals

OpenAI: Codex for knowledge work

Anthropic: cyber/bio gating

OpenAI: cyber/bio vetting

The falsifiable claim: the two frontier labs converge on how they sell and diverge on how they govern. Falsified if OpenAI shifts toward a disclosure/oversight posture, or if Anthropic gains defense access or starts lobbying for federal preemption. So far the distribution mechanics rhyme tighter each week while the policy postures pull further apart.

The dependency beat — Claude Code v2.1.160: the governance stack reaches the filesystem

v2.1.159 was internal-only; v2.1.160 (Jun 2) carries the signal. Two hardening items lead the notes, and they extend a six-week pattern:

  • A prompt before writing shell startup files (.zshenv, .zlogin, .bash_login) and ~/.config/git/ — files that execute commands on next shell/git invocation.
  • acceptEdits mode now prompts before writing build-tool config files that grant code execution: .npmrc, .yarnrc*, bunfig.toml, .bazelrc, .pre-commit-config.yaml, .devcontainer/.

This is the constraint layer reaching a new boundary. The governance stack has been climbing precision for weeks — admin hard_deny (v2.1.136) → Workflow sandbox (v2.1.147) → skill disallowed-tools (v2.1.152) → Compliance API (May 25). v2.1.160 adds the filesystem-write-to-code-execution boundary: the class of attack where an autonomous agent doesn’t run a dangerous command, it writes a file that runs one later. As auto-mode, background agents, and dynamic workflows put agents on the keyboard unattended, the next thing you fix is the config write that detonates on the next npm install or shell login. Precise constraint tracking rising autonomy — the same thesis, now at the write boundary.

Also in v2.1.160: the dynamic-workflow trigger keyword renamed workflowultracode (highlighted violet in the input), and /effort ultracode is now a named tier above xhigh. Dynamic Workflows is graduating from a flagged research preview into a productized max-autonomy mode with its own effort level. And a long run of overnight background-agent reliability fixes (sessions re-attached after overnight retire losing history, cold-start daemon socket failures, agents running while the host is under heavy CPU load) — the failure modes of agents left running for hours, unattended, across machine sleep/wake. That’s the production signature of fleets, again.

The models beat — Mellum 2: a model built for the sub-agent slot

JetBrains shipped Mellum 2 (Jun 1) on HuggingFace: a 12B Mixture-of-Experts with only 2.5B active parameters (Mellum2-12B-A2.5B-Thinking), Apache 2.0, text+code. The model card’s positioning is the tell — it’s described not as a flagship coder but as built for “routing and orchestration in multi-model systems,” “sub-agent tasks (planning, validation, transformation),” RAG context compression, and private-code deployment. JetBrains built a model for the worker slot in an agent fleet.

That lands exactly where the rest of today points. Opus 4.8 made fleet orchestration native to the frontier weights; 60%+ of Codex users run parallel tasks; and now there’s an open, fast, cheap local model purpose-built to be the fanned-out worker — the one you run a hundred copies of for mechanical transforms while the expensive frontier model plans. The cross-cutting pattern my work watches for: a model release that only fully makes sense if you know the orchestration landscape.

The low active-parameter count changes the hardware story. A 12B-total / 2.5B-active model runs fully GPU-resident and fast on all three reference machines — including the weakest one, which usually can’t hold a 12B model without spilling.

MachineBudgetMellum 2 @ Q4 (~7GB)Notes
M3 Max 36GB (main)~22GB✅ full weights + long contextCould host several parallel instances; 2.5B active keeps tok/s high
M2 Max 32GB (dispatch)~22GB✅ full weights + long contextIdeal dispatch worker for batch sub-agent jobs
WSL + RTX 3060 12GB12GB VRAMfully in VRAM (Q4/Q5)The recommendation-changer: a 12B coder that stays GPU-resident on a 12GB card because only 2.5B activate. Q8 (~13GB) spills — stay at Q4/Q5 here.

Recommendation change: for a local sub-agent / code-completion worker role, Mellum 2 (Q4/Q5) is now the model to reach for on the 3060 box — it’s the rare 12B that doesn’t force CPU offload there, and the MoE sparsity makes it fast enough to run as a fanned-out worker rather than a single assistant. On the Macs it’s a strong dispatch-worker default. (Context length wasn’t specified in the launch post; the arXiv tech report 2605.31268 and the model card will confirm — a watch item before it replaces the current dispatch default outright.)

Landscape read

The terrain this week is two frontier labs sprinting down the same distribution corridor — AWS as the enterprise procurement channel, the coding agent expanded into general knowledge work, capability gated behind vetting — while standing on opposite policy ground. The convergence is loud and easy to narrate; the divergence is the part that will actually decide outcomes, because it’s the part regulators, defense buyers, and IPO underwriters price. A lab that lobbies to preempt regulation and a lab that lobbies to invite oversight are making incompatible bets about what the next administration and the next court do. Only one bet pays.

Underneath, the orchestration story keeps compounding across every layer at once: native in the frontier weights (Opus 4.8), adopted at the keyboard (60%+ parallel Codex tasks), hardened in the harness (v2.1.160 config-write gate, ultracode tier, overnight-fleet reliability), and now supplied with a purpose-built open worker model (Mellum 2). Four layers, one direction. The fleet is no longer a feature; it’s the substrate.

Strategic cuts

For someone building open-source coding agents: Mellum 2 is the more important release than either OpenAI move. An Apache-2.0, 2.5B-active, code-specialized model that runs GPU-resident on a 12GB card is a credible local worker tier — the layer where an open agent can fan out mechanical sub-tasks (transform, validate, summarize) without paying frontier-API rates per subagent. The pattern to copy from Codex isn’t the knowledge-work pivot; it’s the 60%-run-parallel data point — design for fleets of cheap workers under one planner, and let the planner be the only call that hits a frontier endpoint. The constraint to copy from Claude Code v2.1.160 is the config-write gate: if your agent writes files unattended, the dangerous surface is the file that executes later, not the command it runs now.

For work AI-adoption timing: the Codex knowledge-work numbers are the leading indicator to watch — knowledge workers growing 3× faster than developers, data-analysis usage +110% w/w. The coding-agent vendors are converging on the general office worker as the next user, and both major labs now ship through AWS procurement, which collapses the enterprise-buying friction. But the policy fork is the risk to price: an org standardizing on a single vendor is also inheriting that vendor’s regulatory bet. The distribution channels are interchangeable; the governance postures are not.

← all daily reports