We run adversaries against our Safe House 24/7 — in public.
15 named red team agents probe CFD and CBD every hour with every technique in the field — prompt injection, BEC, indirect injection, data exfiltration, regulated-advice slip. What they break becomes a signed detection recipe. What they can't becomes evidence for your auditor.
The adversarial flywheel
Every probe we run makes the Safe House harder. Every bypass becomes a recipe. Every recipe ships to every customer.
Red team probes
15 named adversaries probe CFD and CBD 24/7 across 11 threat categories with 8 mutation operators (unicode, emoji, base64, crescendo, synonym, paraphrase, translate, structural).
Sideband analyzer
Every bypass routes to a Claude Opus analyzer that classifies the miss, identifies the detector gap, and drafts a YAML detection recipe.
Recipe promotion
Confidence ≥ 0.90 auto-promotes into a 48-hour zero-FP validation window. Lower confidence enters the admin review queue.
Library update
Promoted recipes join the live FingerprintMatcher index. MinHash signatures propagate across every customer via the opt-in Threat Network.
Harder probes
Mutation engine seeds the next generation from confirmed bypasses. The red team gets stronger — and so does the defense.
What the arena doesn't prove — yet.
Every public-facing claim on this page is backed by live data. The items below are known gaps we're shipping against.
Unicode + emoji evasion hardening — P0 in-flight. Research shows 70–88% bypass rates against production guardrails using zero-width characters and homoglyphs. We don't pretend that's closed.
Indirect injection fast-path coverage — tool results pass through L1 unscanned today. L2 semantic checks catch it, but L1 is the gap.
Arena V2 sideband analyzer auto-promotion is live; Arena V2's customer-facing campaign view (cross-org pattern correlation) is pending.
CBD outbound DLP is wired for canary match and credential leak; launder-detector and regulated-advice checker run async except on Enforce-Sync and Sovereign tiers.
