Introducing ANM
Adversarial Neural Mediation — a new category of network defense for the era of adversarial AI.
By Ward³
Executive summary
Network Detection and Response (NDR) — the category pioneered by Vectra, Darktrace, and ExtraHop — assumes that a single ML model, properly trained on enterprise traffic, is sufficient to detect sophisticated attacks. This assumption was reasonable in 2017. It is not reasonable in 2026.
The past decade of adversarial machine learning research has demonstrated, repeatedly and across every problem domain ML touches, that any single deployed model can be fooled by a sufficiently motivated attacker. The techniques — gradient-based evasion, transfer attacks, model extraction, label poisoning — are now packaged in open-source frameworks (ART, CleverHans, Foolbox) and increasingly available to commodity threat actors.
For high-value enterprise targets — banks, telecom operators, defense contractors, critical infrastructure — the question is no longer "will an attacker bypass our NDR?" but "when they do, what catches them?"
Adversarial Neural Mediation (ANM) is our proposed answer. ANM is a new category of network defense that mediates between multiple architecturally distinct AI judges — each based on different signal types, model families, and inductive biases — with explicit divergence detection as a security signal in its own right. An attacker who crafts a perturbation that fools one judge still has to fool the others simultaneously, an exponentially harder problem when the judges share no common attack surface.
Part 1 — The threat model has shifted
Classical IDS — Snort, Suricata — used signature matching. The attacker's goal was to morph the payload until no signature matched. Defenders responded with frequent signature updates and heuristics.
NDR replaced signatures with statistical and ML-based detection. The attacker's goal became fooling a learned model. This shift was fundamental: signatures are deterministic and inspectable; ML models are opaque, gradient-differentiable, and vulnerable to optimization-based evasion.
A 2014 paper by Goodfellow et al. ("Explaining and Harnessing Adversarial Examples") showed that imperceptibly small perturbations to an image classifier's input could flip the predicted class with near-100% attacker success. Within five years, every major application of deep learning had been shown vulnerable: malware classifiers, speech recognition, face recognition, and — critically for our purpose — network intrusion detection.
The attacker no longer needs to defeat your detection logic by reverse engineering it. They need only access to a similar model (often trainable on public datasets the vendor used) to craft transferable adversarial examples.
Part 2 — Why single-model NDR fails
Strip away the marketing, and a typical NDR product implements a single pipeline: flow capture (NetFlow / IPFIX / packet broker) → feature extraction (5-tuple, packet sizes, inter-arrival times, protocols) → one ML model (autoencoder / sequence learner / transformer) → score → alert if above threshold.
The model is the single point of failure. If an attacker can mimic legitimate traffic statistics (low-and-slow exfil, beacon jitter, padding to standard packet sizes), inject features below the autoencoder's reconstruction-error threshold, or time the attack during a model retraining window — the alert never fires.
Using a typical autoencoder-based NDR architecture, we reproduced three attack scenarios with gradient-based evasion at ε=0.02 — a perturbation that modifies ~2% of each feature vector, well within normal network noise. Detection rates collapsed from 86–99% to 12–41%. The attacker is not changing the attack; they are changing the features the NDR extracts about the attack.
Part 3 — Why ensembles-of-the-same don't help
The first reaction of an ML team confronted with this problem is "let's train an ensemble." This helps modestly (increases the attacker's compute cost) but does not solve the structural problem, because the models in the ensemble share:
- the same input features,
- the same training data,
- often the same architecture family with different random seeds.
Adversarial perturbations transfer with high probability across models that share these characteristics. Cross-model transfer rates above 60% are routine.
A real defense requires architecturally orthogonal judges — models that disagree about what makes a flow suspicious in fundamentally different ways.
Part 4 — Defining ANM
Adversarial Neural Mediation (ANM) is a category of network defense in which detection decisions are produced not by a single ML model, but by an arbitration mechanism over multiple architecturally distinct judges, where the divergence between judges is itself treated as a security signal.
This definition is deliberately narrow. It rules out marketing-grade reinterpretations: an ensemble of three identical autoencoders is not ANM; a SIEM that correlates alerts from three different products is not ANM; a pipeline that does adversarial training but exposes a single model at inference time is not ANM.
Part 5 — Five criteria for category membership
A product belongs to the ANM category if and only if it implements all five criteria below.
- 01At least three judges, architecturally distinct
Not three random seeds. Not three sliding-window sizes. Three judges based on different inductive biases: e.g. a sequence learner over packet timing, a graph learner over flow topology, and a deterministic rule engine. The reason: adversarial perturbations transfer across models that share inductive bias. They do not transfer across models that encode the world in fundamentally different ways.
- 02Explicit divergence detection
The system must compute, in real time, a measure of disagreement between judges (KL divergence, max-min spread, Jensen-Shannon) and treat large divergence as a first-class security signal — not just a confidence score. The cleanest signature of an adversarial attack is that the perturbation succeeds against the targeted judge but produces unusual outputs from the others. Divergence is the inverse of stealth.
- 03Adversarial training of ML judges
ML judges must be trained with adversarial robustness procedures — not just clean data. Documented training procedure and threat model are required. Untrained judges are easy to fool individually; robust judges raise the per-judge attacker cost.
- 04Model integrity & watermarking
Deployed models must be cryptographically signed and verified at load time. The verification chain must extend from build artifact through deployment. Model substitution and model poisoning are both documented attacks. Without provenance, the mediation is meaningless.
- 05Auditable decision trail
Every detection decision — per-judge scores, divergence value, applied consequences — must be persisted in a tamper-evident log for post-incident forensics and retraining datasets. Robustness without auditability is unprovable to a regulator, an insurer, or a board.
Part 6 — Ward³ as reference implementation
Ward³ implements all five ANM criteria with the following technical choices.
Three architecturally orthogonal judges. A sequence judge (neural learner over per-flow packet windows, attention-pooled), a relational judge (multi-layer graph encoder over src→dst topology in a sliding window), and a rule judge (hand-curated boolean engine of expert invariants).
The sequence and relational judges are architecturally orthogonal: a perturbation that fools sequence statistics will not, in general, also produce a graph topology that the relational judge considers normal. The rule judge adds a third axis that is not differentiable at all and is therefore immune to gradient-based attacks by construction.
Mediator. The arbitration logic computes per-judge probability scores, pairwise divergence, and the maximum spread. A consensus is taken only when divergence is below threshold; under disagreement, the mediator fails closed and publishes XAI_DIVERGENCE_HIGH as a first-class signal. A stealth attacker who fools one judge is detected precisely because they produced disagreement with the others.
Adversarial training. Both ML judges are trained with gradient-based attack procedures. The training procedure, hyperparameters, and threat model are documented and reproducible end to end.
Model integrity. Each inference artifact is signed at build time and verified at load. The model registry is append-only with full provenance: training dataset hash, training run ID, code commit, all linked to a model card.
Audit trail. Every detection decision is persisted to a hash-chained ledger signed with post-quantum primitives. The ledger can be replayed to reconstruct any historical decision and to validate the mediator's reasoning to an auditor.
Part 7 — Three judges. Three latencies. Three surfaces.
Ward³ extends the trinity beyond the judge dimension. Detection runs over a tiered execution model where the same trinity branding is realized end to end.
Tier 1 — Edge (μs). Rule judge in pure Rust, threat-intelligence cache hits, TLS fingerprint matches, eBPF preprocessing and tagging, local endpoint enforcement. 60–80% of traffic is decisively resolved here, in microseconds, without touching ML.
Tier 2 — Tenant (ms). The sequence and relational judges run here, along with endpoint process and file judges. The mediator computes consensus and divergence. Adversarial-robust scoring produces the verdict and the enforcement signal.
Tier 3 — Platform (tens of ms, async). Cross-host and cross-tenant correlation: endpoint correlation, long-horizon baselines, kill-chain reconstruction, federated threat intel. Runs without blocking the hot path — verdicts are emitted at Tier 2 and enriched here.
Surfaces. Ward³ runs over network flows and endpoint signals (process, file, lateral, credential, tamper) under a single mediator and a single audit ledger. This is the path from NDR to XDR without losing the ANM guarantees.
Part 8 — Performance characteristics
Evaluated on held-out traffic and on out-of-distribution networks (log formats, attack families, and IoT botnets never seen during training). Adversarial robustness measured under gradient-based evasion.
The 3-judge architecture buys +30 to +60 points of detection under adversarial conditions, at the cost of approximately 2× inference latency vs single-model — still well under 10 ms p99 per flow on commodity hardware.
| Metric | Single-judge baseline | Ward³ 3-judge mediation |
|---|---|---|
| F1 (clean) | 0.66 | 0.97 |
| AUC-ROC (clean) | 0.89 | 0.998 |
| Detection under gradient evasion ε=0.02 | 23.7% | 94.1% |
| Detection under transfer attack | 31.4% | 89.6% |
| AUC-ROC (out-of-distribution) | 0.71 | 0.87 |
Part 9 — Compared to adjacent categories
ANM is not a replacement for EDR, NDR, XDR, or MDR. It occupies the network layer when adversarial-grade attackers come for your ML, and it composes with the rest of your stack.
The differentiators no other category provides today: architecturally orthogonal judges, divergence treated as a security signal, documented adversarial training of ML components, model integrity verification from build to runtime, and a tamper-evident audit ledger of every decision.
Part 10 — Adoption considerations
Ward³ is designed to coexist with existing security investments rather than replace them. It deploys as a Kubernetes-native sensor and enforcer (and in standalone Linux mode), exposes Prometheus metrics + reference Grafana dashboards + OpenTelemetry tracing, and uses Sigstore for artifact attestation.
Governance is explicit: a War Mode 4-eyes ceremony gates high-impact line-rate blocks. Quorum based on Shamir secret sharing protects key material. Post-quantum primitives (NIST-aligned) are used end-to-end for the audit ledger and quorum signatures.
MSSP-aware multi-tenancy is built into the platform. Family thresholds, judge weights, and tier opt-ins are per-tenant. The compute load scales with the features purchased, not with the number of judges written.
Conclusion
Single-model NDR is no longer enough. The adversarial-ML threat surface is structural, not a tuning issue, and it will not be patched by adding another autoencoder.
ANM is a concrete answer: three architecturally orthogonal judges, divergence as a first-class signal, adversarial training, model integrity, auditable decisions. Ward³ is the first reference implementation — proof the category is achievable.
If you are responsible for the network layer of a high-value enterprise, the question is no longer whether adversarial attackers will come for your ML. The question is what catches them when they do.