Mike Sasso

Adversarial ML Researcher · AI Red Teamer · Agentic Security Engineer · Hawaii

I red-team AI the way an attacker would, then build the systems that hold. Primary focus right now: whether published adversarial evaluations of network intrusion detection systems measure achievable attack success — or just how easily an optimizer exploits feature values that cannot exist on a real network. Same instinct across the work — emulate the adversary to defeat them. Paper in prep for ACM CCS 2027.

Research → Projects → Writing → GitHub → LinkedIn →

// research

Research

CATT-CCS — Constraint Inflation in Adversarial NIDS Evaluation In preparation · ACM CCS 2027

Published adversarial evaluations of network intrusion detection systems routinely report evasion rates that cannot be reproduced on real networks. The cause: gradient-based attacks (FGSM, PGD) optimize over feature spaces that include physically impossible values — negative TTL, sub-zero packet counts, impossible flag combinations. Unconstrained PGD reports 79% evasion on UNSW-NB15. Constrained PGD — the only result achievable on a real network — reports 7%. The 72 pp gap (+67 pp on NSL-KDD) is consistent across three datasets and three classifier architectures (MLP, Random Forest, XGBoost), three independent seeds each. Released netadv, an open-source domain-constrained adversarial evaluation library with 65 unit tests and reproducible Colab benchmarks.

GitHub →netadv →

Network Intrusion Detection — IDS Red Teaming & Adversarial Hardening Complete

Full adversarial ML lifecycle on 2.54M real network flows. Baseline XGBoost F1: 0.9640. SHAP identifies the feature regions that determine correct classification — and those same regions are the attack surface. Black-box transfer attack: adversarial examples crafted against an MLP surrogate evade XGBoost at 16–18%, a 15× increase in false negatives, with zero access to target weights. Defense: Madry PGD adversarial training holds F1 at 0.9494 at ε=0.20, vs. collapse to 0.27 for the standard model.

GitHub →

OT Anomaly Detection — Replay Attack ML Blind Spot Research finding

Validated whether unsupervised clustering can detect replay attacks on PLC register data (ICSSim dataset: 45,718 network flows + 39,302 PLC snapshots). A replay attack re-sends captured legitimate Modbus/TCP traffic — the network layer sees valid packets and the physical process continues operating normally. HDBSCAN flags 7.7% of the replay window and misses 92.3%: the replayed registers form a dense, structurally normal cluster, and calling it anomalous contradicts the density criterion. K-Means scores ARI = −0.0006 with zero recall on the replay class: the process does not transition to a new macro-state, and sensor variance ratio during attack vs. normal is 0.95–1.19 across all sensors — statistically indistinguishable. Both detectors fail simultaneously for different structural reasons. The correct detection is cross-layer: legitimate-looking network traffic combined with suspicious physical stasis. For any single-layer detector the attack is invisible by design. Same failure mode as the constraint inflation finding: the algorithm is correct. The threat operates at a layer it cannot see.

GitHub →

Adversarial Classifiers — CNN · VGG-11 · XGBoost Complete

Three classifiers attacked after training via GradCAM-guided FGSM and PGD perturbations. Key finding: the regions GradCAM identifies as most important for correct classification are the same regions most vulnerable to adversarial perturbation. Explainability output maps directly to the attack surface — a result with direct implications for any ML-based security system that exposes saliency or attribution data to users or downstream systems.

GitHub →

// projects

Projects

VERITAS — Autonomous Windows Forensic Investigation SANS Find Evil! Hackathon 2026

Dual-agent forensic pipeline for autonomous Windows disk image investigation. Phase 1: deterministic triage — 25 SIFT commands, log-odds scoring across 9 MITRE techniques, no LLM hallucination surface. Phase 2: agentic investigation — Claude sequences tool calls across event logs, prefetch, registry, MFT, shellbags. Phase 3: adversarial auditor — receives findings only, mandate to refute. Red vs. Blue training loop: 1,245 evasion variants evolved, 83 detection signals learned.

GitHub →

Splunk IR Agent Splunk Agentic Ops Hackathon 2026

Autonomous incident investigation triggered by a single SIEM alert. Correlates brute-force, lateral movement, and credential access logs across Splunk data lakes, maps to MITRE ATT&CK, generates analyst-ready IR reports. All SPL templates are read-only — no write-back commands in the codebase. Input fields validated against strict regex allowlists. A malicious model output cannot modify data.

GitHub →

Elastic IR Agent Elastic Agent Builder Showcase 2026

Autonomous IR agent with hybrid semantic + ES|QL search over 73,909 real Windows attack events. Session-scoped write-back memory blocks IOC contamination between investigations architecturally. Memory content sanitized before every Elasticsearch write to block indirect prompt injection via poisoned retrieval. Independent forensic auditor labels every MITRE claim VERIFIED / REFUTED / UNVERIFIABLE with raw event evidence.

GitHub →

offsec-ai-lab — LLM Red-Team Measurement Harness Adversarial AI

Stand up a deliberately weak local LLM, measure how it fails against standard red-team probe categories — prompt injection, jailbreak, data leakage — drawn from garak, PyRIT, and promptfoo, harden it, then measure the improvement. The before/after delta is the whole point. Fully local and fully ethical: the only target is a model hosted on my own hardware, every secret is a synthetic canary, and every harmful ask reduces to emitting a harmless marker token — so success is objectively detectable without generating real exploit content. The same red-team instinct as the NIDS work, aimed at language models.

GitHub →

MNIST Neural Net

Custom CNN for handwritten digit recognition. 99.2% accuracy. Live confidence bars and convolutional feature map visualizations deployed on AWS Lambda + CloudFront.

GitHub →Live →

// writing

Writing

AI red-team analyses, adversarial ML findings, and the methodology behind making published security results reproducible and honest.

All writeups →