Whitepaper · May 2026

Understanding ANM

Adversarial Neural Mediation: network defense for attacks that target machine learning too.

By Ward³

Executive summary

Network Detection and Response (NDR), popularized by companies such as Vectra, Darktrace, and ExtraHop, often rests on a simple assumption: one well-trained ML model can spot advanced attacks in enterprise traffic. That was a reasonable bet in 2017. In 2026, it is too brittle for high-value environments.

A decade of adversarial machine learning research has shown the same pattern across many domains: a model used alone can be fooled by a motivated attacker. Gradient-based evasion, transfer attacks, model extraction, and label poisoning are no longer lab curiosities. They are available through open-source frameworks such as ART, CleverHans, and Foolbox.

For a bank, telco, critical infrastructure operator, or defense organization, the question is no longer only "can our NDR be bypassed?" It is also "if one model gets it wrong, what detects the inconsistency?"

Adversarial Neural Mediation (ANM) starts there. Instead of leaving the decision to one model, ANM arbitrates between multiple AI judges that are genuinely different: different signals, architectures, and inductive biases. Their disagreement is measured explicitly and treated as a security signal. Fooling one judge is no longer enough; an attacker has to fool several ways of reading the network at once.

Part 1 — The threat model has shifted

Classical IDS tools such as Snort and Suricata relied on signatures. The attacker's job was to change the payload until it no longer matched a rule. Defenders responded with new signatures and heuristics.

NDR moved the problem toward statistical and ML-based detection. The attacker is no longer just avoiding a signature; they are trying to make a learned model choose the wrong answer. That matters because ML models are less inspectable, often differentiable, and exposed to optimization-based evasion.

As early as 2014, Goodfellow et al. showed that very small perturbations could flip an image classifier's prediction with very high attacker success. The following years confirmed the risk in malware, speech, face recognition, and, most relevant here, network intrusion detection.

The attacker does not have to understand every detail of your detection logic. If they can train or obtain a similar enough model, often from public datasets, they can create adversarial examples that transfer.

Part 2 — Why single-model NDR fails

Under the packaging, many NDR products follow the same pipeline: flow capture (NetFlow / IPFIX / packet broker), feature extraction (5-tuple, packet sizes, inter-arrival times, protocols), one ML model, scoring, and an alert if the score crosses a threshold.

The model then becomes the main point of failure. If an attacker can mimic legitimate traffic statistics, keep features below the reconstruction-error threshold, or time the attack around a retraining window, the alert may never fire.

Empirical evidence

Using a typical autoencoder-based NDR architecture, we reproduced three gradient-evasion scenarios at ε=0.02, a perturbation of roughly 2% of each feature vector. Detection rates fell from 86–99% to 12–41%. The attack itself does not really change; the features the NDR sees about the attack do.

Part 3 — Why homogeneous ensembles are not enough

The natural reaction from an ML team is to suggest an ensemble. That helps, because it raises the attacker's cost, but it does not solve the problem if the models are too similar. They still share:

the same input features,
the same training data,
often the same architecture family with different random seeds.

When those pieces are shared, adversarial perturbations often transfer from one model to another. Cross-model transfer rates above 60% are common.

The defense gets stronger when the judges are meaningfully different: each one should have a different reason to call a flow suspicious.

Part 4 — What we mean by ANM

Adversarial Neural Mediation (ANM) means network defense where the decision does not come from one ML model, but from arbitration across multiple architecturally distinct judges. Divergence between judges is not a statistical footnote; it becomes a security signal.

The definition is deliberately strict. Three nearly identical autoencoders are not ANM. A SIEM correlating alerts from three products is not ANM. And a pipeline trained adversarially but exposing only one model at inference time is still a single-model pipeline.

Part 5 — The five criteria

To talk seriously about ANM, all five criteria below need to be present together.

01
At least three judges, architecturally distinct
Not three random seeds, and not three sliding-window sizes. You need three different ways to read traffic: for example, a sequence model over packet timing, a graph model over flow topology, and a deterministic rule engine. The goal is simple: avoid having one perturbation fool everyone for the same reason.
02
Explicit divergence detection
The system must compute disagreement between judges in real time (KL divergence, max-min spread, Jensen-Shannon) and treat it as a security signal, not just a confidence score. An adversarial attack that succeeds against one judge often produces unusual outputs from the others. That disagreement is the useful part.
03
Adversarial training of ML judges
ML judges must be trained with adversarial robustness procedures, not only clean data. The procedure and threat model need to be documented. An untrained judge remains easy to fool alone; a robust judge raises the attacker's cost.
04
Model integrity & watermarking
Deployed models must be signed and verified at load time, with a chain that runs from build artifact to deployment. Model substitution and poisoning are real attacks. Without provenance, mediation loses its value.
05
Auditable decision trail
Every detection decision — per-judge scores, divergence, and action taken — must be kept in a tamper-evident log. That matters for forensics, retraining, and audits. Robustness that cannot be replayed is hard to defend to a regulator, insurer, or board.

Part 6 — Ward³ as an implementation

Ward³ implements the five criteria through concrete technical choices.

The system uses three judges: a sequence judge that observes per-flow packet windows, a relational judge that analyzes src→dst topology in a sliding window, and a rule judge built from expert invariants.

The first two judges do not read traffic the same way. A perturbation that makes a sequence look plausible does not necessarily make the graph topology plausible. The rule judge adds a third, non-differentiable axis, keeping it out of direct reach for gradient-based attacks.

The mediator computes per-judge scores, pairwise divergence, and the maximum gap. It only produces consensus when divergence stays below threshold. When judges disagree, it fails closed and publishes XAI_DIVERGENCE_HIGH. The attack becomes visible not because one judge is always right, but because the judges stop telling the same story.

Both ML judges are trained with gradient-based attack procedures. The threat model, hyperparameters, and training procedure are documented and reproducible.

Each inference artifact is signed at build time and verified at load. The model registry stays append-only and keeps provenance: training dataset hash, run ID, code commit, and a link to the model card.

Every decision is persisted in a hash-chained ledger signed with post-quantum primitives. The ledger can be replayed to reconstruct a past decision and explain the mediator's reasoning to an auditor.

Part 7 — Three judges, three latencies, three surfaces

Ward³ applies the same idea at runtime: not every decision needs the same depth of analysis or the same latency budget.

Tier 1 — Edge (μs). The Rust rule judge, threat-intelligence cache, TLS fingerprints, eBPF preprocessing, and local enforcement handle clear cases close to the traffic. A large share of traffic can be resolved here, in microseconds, without calling the ML models.

Tier 2 — Tenant (ms). The sequence and relational judges run here, along with endpoint process and file judges. The mediator computes consensus and divergence, then produces the verdict and enforcement signal.

Tier 3 — Platform (tens of ms, async). The platform enriches decisions through cross-host and cross-tenant correlation: endpoint context, long-horizon baselines, kill-chain reconstruction, and federated threat intel. This work does not block the hot path; it enriches verdicts emitted at Tier 2.

Ward³ covers network flows and endpoint signals (process, file, lateral movement, credential, tamper) under one mediator and one audit ledger. It is a path from NDR toward XDR without giving up the ANM guarantees.

Part 8 — Performance characteristics

The results below come from held-out traffic and out-of-distribution networks: log formats, attack families, and IoT botnets never seen during training. Adversarial robustness is measured under gradient-based evasion.

The three-judge architecture adds +30 to +60 points of detection under adversarial conditions. The tradeoff is roughly 2× inference latency compared with a single model, still under 10 ms p99 per flow on commodity hardware.

Metric	Single-judge baseline	Ward³ 3-judge mediation
F1 (clean)	0.66	0.97
AUC-ROC (clean)	0.89	0.998
Detection under gradient evasion ε=0.02	23.7%	94.1%
Detection under transfer attack	31.4%	89.6%
AUC-ROC (out-of-distribution)	0.71	0.87

Part 9 — Compared to adjacent categories

Ward³ is meant to bring EDR, NDR, and XDR capabilities into an ANM-grade platform. The surfaces those categories often handle separately — endpoint, network, multi-tenant correlation — converge under a shared mediator, audit ledger, and governance layer.

The difference is not only coverage. It is the set of guarantees: architecturally distinct judges, divergence used as a security signal, documented adversarial training, model integrity from build to runtime, and a tamper-evident ledger for every decision.

Part 10 — Adoption considerations

Adoption can happen in phases. A team can start with one surface, usually network, validate the ANM guarantees against an adversarial red-team scenario, and then expand as legacy contracts come up for renewal. Ward³ deploys as a Kubernetes-native sensor and enforcer, or in standalone Linux mode, with Prometheus metrics, Grafana dashboards, OpenTelemetry tracing, and Sigstore attestation.

Governance is explicit. War Mode 4-eyes controls high-impact blocks. Quorum based on Shamir secret sharing protects key material. NIST-aligned post-quantum primitives cover the audit ledger and quorum signatures.

MSSP-aware multi-tenancy is built in from the start. Family thresholds, judge weights, and tier opt-ins are per tenant. Compute load follows the enabled features, not just the number of judges declared.

Conclusion

Single-model NDR is no longer enough for environments where ML itself becomes a target. The issue is not just threshold tuning, and it will not go away by adding another autoencoder of the same kind.

ANM offers a concrete answer: multiple architecturally distinct judges, divergence as a security signal, adversarial training, model integrity, and auditable decisions. Ward³ shows how to assemble those guarantees into a coherent implementation.

If you are responsible for network defense in a high-value organization, the important question is not only whether an attacker will test your ML. It is what catches the attack when one model gets it wrong.

Request access to discuss Ward³ in your context.

Request access