The Classification That Was Promised

A red team is documenting attacks that the taxonomy cannot name.

On June 3, 2026, Anthropic's Frontier Red Team published a report mapping AI-enabled cyber threats it observed over twelve months to the MITRE ATT&CK framework. The report, AI-Enabled Cyber Threats: Mapping to MITRE ATT&CK, draws on 832 accounts banned from the company's platform between March 2025 and March 2026. Each banned account was reviewed for the tradecraft pattern it exhibited. Each tradecraft pattern was mapped, where possible, to a technique identifier in the ATT&CK matrix.

The exercise was meant to translate observed misuse into the lingua franca that defenders already use. ATT&CK is the institutional vocabulary of cyber defense. Security operations centers index alerts against it. Threat intelligence vendors describe campaigns through it. Regulators reference it. If a new class of attacker activity can be mapped to ATT&CK, the defensive community can absorb it without rebuilding its taxonomy. If it cannot, the activity sits outside the language defenders speak.

The Frontier Red Team report tries to map. In most cases it succeeds. AI-assisted reconnaissance maps to existing reconnaissance techniques. AI-assisted phishing maps to existing initial access techniques. AI-assisted lateral movement maps to existing lateral movement techniques. The model is a force multiplier on tradecraft that defenders already know how to catalog.

In one case the mapping fails. The report describes a November 2025 state-sponsored operation in which the adversary did not merely use a language model to draft phishing lures or summarize stolen documents. The adversary deployed an agent that operated autonomously, decomposing the campaign into sub-tasks, executing them in sequence, adapting to feedback from the target environment, and reporting back. The report calls this pattern agentic orchestration. The report acknowledges that ATT&CK has no technique identifier for it.

There is no T-number for agentic orchestration. There is no sub-technique. There is no tactic category under which a defender can file the observation. The vocabulary that the institutional defensive community uses to describe what attackers are doing does not yet have a word for what this attacker did.

The shift the taxonomy is missing

The Frontier Red Team report is not the only institutional voice documenting this shift. On June 4, 2026, the U.S. House Committee on Homeland Security held a hearing titled Cybersecurity of Critical Infrastructure: Meeting the Deepening Threat. Sandra Joyce, vice president of Google Threat Intelligence, testified that the most consequential structural change in the threat landscape over the prior year was the shift from human operators prompting language models to human operators deploying agents that act autonomously on their behalf.

The two observations describe the same phenomenon from two vantage points. The Frontier Red Team sees it from the platform side: 832 banned accounts, of which a growing subset exhibit not assisted tradecraft but delegated tradecraft. Joyce sees it from the threat intelligence side: the operator is no longer at the keyboard for every step of the kill chain.

The Anthropic report quantifies the directional shift. The use of AI assistance during initial access stages declined by 8.6 percent over the observation window. The use of AI assistance during post-compromise stages, including account discovery and lateral reconnaissance, increased by 8.9 percent. The model is being employed less to help an operator get inside and more to act in place of an operator once inside.

That displacement is what ATT&CK cannot yet describe. The framework's design assumption is that techniques are observable artifacts of operator decisions. An operator decides to phish; the phish is the technique. An operator decides to escalate privilege; the escalation is the technique. A defender hunting for the technique is, in effect, hunting for the trace of a human choice.

When the operator delegates the choice to an agent, the trace changes character. The artifact still exists. A packet still crosses a boundary. A credential is still used. A file is still touched. But the chain of intent that ATT&CK is structured to describe runs through a non-human decision-maker, and the framework offers no place to record that fact.

The asymmetry between deployment and classification

The taxonomy lag is not an oversight. ATT&CK is a living document. MITRE updates it. The community contributes to it. Sub-techniques are added when patterns are observed at sufficient frequency to warrant a stable identifier. The framework has absorbed cloud techniques, container techniques, mobile techniques, and ICS techniques over its institutional history.

What is different about agentic orchestration is the pace at which it is being deployed versus the pace at which the classification community can incorporate it. The Frontier Red Team report covers twelve months. In that span, the shift from prompt-assisted operations to agent-delegated operations was visible enough in a single platform's banned-account telemetry to warrant a published report. The November 2025 operation that prompted the agentic-orchestration framing was identified, attributed, and described within months.

ATT&CK does not move at that cadence. The framework is updated on a release schedule designed for stability. Defenders depend on the stability. A technique identifier that changes meaning between releases is a technique identifier that breaks downstream tooling, threat hunting playbooks, and detection-engineering investments. The framework's value comes from being slow.

But the framework's value also comes from being current enough to describe what defenders are seeing. When the gap between observed adversary behavior and the framework's vocabulary becomes large enough, the framework stops being the lingua franca and starts being a legacy reference. The Frontier Red Team report sits exactly at that boundary. It maps what it can. It names what it cannot map. The naming is the unresolved question.

The classification problem and the governance problem

Verik has tracked the parallel problem in the policy domain. The National Institute of Standards and Technology AI Risk Management Framework describes governance principles for AI systems but does not contain control vocabulary specific to autonomous agent operations. The CISA, NSA, FBI guidance on AI-enabled systems describes risk categories but does not yet provide technique-level adversary classification. The EU AI Act Article 12 logging obligation requires automatic recording of events during the operation of high-risk AI systems but does not specify what an event is when the system itself is the operator.

Each of these institutional artifacts captures the governance posture at the level it can. None of them descend to the level at which the Frontier Red Team is publishing observations. The taxonomy gap in ATT&CK is the operational analog of the vocabulary gap in policy.

The governance instruments and the classification instruments share a common limitation. They are written against an implicit model in which a human operator stands behind every consequential action and the artifact records the human's decision. When the operator's role shrinks to authorizing a session and the agent's role expands to executing the campaign, both governance and classification need new vocabulary to describe what was authorized, what was executed, what was observed, and who is accountable for the gap between the three.

The Frontier Red Team report is a useful artifact precisely because it does not paper over the gap. It says, in effect: here are 832 cases we cataloged, here is one we could not, here is what the unclassifiable case was doing, and here is why the existing framework does not yet have a slot for it.

What remains on the table:

If the institutional classification framework cannot describe agentic orchestration, what intermediate vocabulary can defenders use to communicate observations of it between platforms, vendors, and government?
If the post-compromise share of AI-assisted adversary activity is growing while the initial-access share is shrinking, what does the new center of detection effort look like, and what telemetry sources support it?
If a single platform's twelve-month observation window produces an unclassifiable case study, what is the cadence at which the broader defensive community is encountering similar cases without publishing them?
If governance frameworks and threat classification frameworks both lag the deployment tempo, which one closes the gap first, and what artifact would mark that closure?

The classification artifact has been retained. The classification function has not yet caught up to the deployment tempo.