Is audit free to use?

audit is free to use as an MIT-licensed open-source project, but the runtime cost depends on the model path you choose. If you already have Claude Pro or Max, audit can run without a separate API key, and gateway-based setups use the provider billing instead. The repo itself does not impose a commercial license fee.

How does audit compare to Semgrep?

audit is more agentic than Semgrep because it explores a codebase, generates scoped attack tasks, validates findings, and gates them on reachability. Semgrep is better when you want deterministic rule-based scanning and fast CI feedback. audit is the better pick when you need deeper investigation, while Semgrep is the better pick when you want stable pattern checks.

Does audit support OpenRouter?

Yes, audit supports OpenRouter through Anthropic-compatible settings like `ANTHROPIC_BASE_URL` and `ANTHROPIC_AUTH_TOKEN`. You also need to unset or empty `ANTHROPIC_API_KEY` so the gateway path is not shadowed. This makes audit usable with OpenRouter credits instead of a direct Anthropic subscription.

Can audit run against a live target?

Yes, audit supports a live-target mode with `--target-url` and optional `--target-creds` values. In that mode, Hunt reproduces findings against the deployed service, Validate rejects results that do not reproduce, and Trace confirms attacker reachability with real HTTP round-trips. That is useful when source-code-only analysis is not enough.

Why does audit use many narrow agents instead of one broad agent?

audit uses narrow agents because security work gets worse when one model tries to reason about the entire repository at once. The split into Recon, Hunt, Validate, and Trace reduces search space, introduces disagreement, and makes the final report easier to trust. That structure is designed to lower false positives and improve coverage.

What language and SDK does audit use?

audit is implemented in Python and is driven by the Claude Code Agent SDK. Its stages are defined with markdown prompts plus JSON Schemas, which keeps outputs consistent and easy to orchestrate. That design is why audit can chain multiple agent steps without brittle post-processing.

audit: Best AI Vulnerability Discovery Agents for AppSec in 2026

audit turns Claude subscription access into an 8-stage, schema-validated vulnerability-discovery pipeline that uses narrow agents, adversarial validation, and reachability tracing to cut false positives before reporting.

What Is audit?

audit is one of the best AI Vulnerability Discovery Agents tools for AppSec teams, security researchers, and senior engineers, and it was built by evilsocket as an MIT-licensed Python repo. It runs an 8-stage pipeline that fans out into narrow Hunt tasks, then forces validation, deduplication, and reachability tracing so the final report only keeps findings that can survive adversarial review. The default workflow can generate 15-50 Hunt tasks on a real codebase, which makes the system useful for large repos where single-pass LLM prompts miss context.

Quick Overview

Attribute	Details
Type	AI Vulnerability Discovery Agents
Best For	AppSec teams, security researchers, and senior developers
Language/Stack	Python, Claude Code Agent SDK, Anthropic Messages API, JSON Schema, YAML prompts
License	MIT
GitHub Stars	N/A
Pricing	Open-Source
Last Release	N/A

Who Should Use audit?

AppSec engineers who need an agentic workflow that can map a repository, propose attack surfaces, and pressure-test findings before they reach a human reviewer.
Security researchers looking for a reproducible harness that turns wide-codebase exploration into structured Hunt tasks instead of one giant prompt that drifts.
Platform teams responsible for monorepos or many services where reachability matters more than raw pattern matches.
Indie hackers and CTOs who want a practical security pass over private code without wiring together a custom orchestration stack.

Not ideal for:

Teams that want a zero-configuration static scanner with no model auth, no gateway setup, and no budget controls.
Repos where you only need deterministic lint-style findings and do not want agentic exploration or model-driven reasoning.
Organizations that cannot use Claude-compatible tooling or do not want to manage subscription OAuth, gateways, or per-stage model policy.

Key Features of audit

8-stage security pipeline — audit splits work into Recon, Hunt, Validate, Gapfill, Dedupe, Trace, Feedback, and Report. That structure matters because each stage has a narrower job and a different default model, which reduces the chance that one agent hallucinates both the bug and the proof.
Narrow-agent fanout — Recon emits tightly scoped Hunt tasks like a specific sink, trust boundary, or attack class. That makes audit behave more like a security research team than a generic chatbot, and it avoids the common failure mode where one agent tries to solve the whole repository at once.
Adversarial validation — Validate uses a different model from Hunt and is explicitly asked to disprove the original finding. This is the strongest anti-confirmation-bias feature in audit because it treats every result as guilty until a second model fails to kill it.
Reachability gating — Trace must prove attacker-controlled input can reach the sink before a report is treated as real. This is where audit beats simple code search, because a vulnerable line that no attacker can hit is usually noise in a triage queue.
Schema-stable outputs — Each stage is backed by a markdown prompt in prompts/ and a JSON Schema in schemas/, then the orchestrator injects the schema into the system prompt. That design keeps outputs machine-readable on the first pass and makes it much easier to chain stages without brittle parsing glue.
Provider flexibility — audit can run through Claude subscription OAuth, a headless token from claude setup-token, or an Anthropic-compatible gateway such as OpenRouter. If you already have gateway infrastructure, that means audit can fit into an existing model budget instead of forcing a new vendor path.
Cost guardrails and live-target mode — the runner can cap concurrency, initial task fanout, and total spend with --max-concurrency, --max-recon-tasks, and --max-cost-usd. When --target-url is set, audit can reproduce findings against a live service and reject anything that does not reproduce, which is a much better signal than local-only PoC generation.

audit vs Alternatives

Tool	Best For	Key Differentiator	Pricing
audit	Agentic vulnerability discovery with reachability proof	Eight-stage workflow with adversarial validation and feedback loops	Open-Source
Semgrep	Fast rule-based scanning in CI	Deterministic pattern matching and custom rules	Free / Paid
CodeQL	Deep semantic code querying	Tainted-flow queries and large-scale code analysis	Free / Paid
OpenSwarm	General multi-agent orchestration	Broader agent coordination without a security-specific pipeline	Open-Source

Pick Semgrep when you want quick, deterministic checks for known anti-patterns and you are optimizing for CI speed. Pick CodeQL when you need semantic queries over a large polyglot estate and your team already maintains a query pack.

Use OpenSwarm when you need flexible multi-agent orchestration for non-security work, such as research or planning, and you do not need audit's fixed reachability gate. If you only need to validate sink reachability after another scanner flags something, pair the workflow with OpenTrace instead of rerunning the full audit pipeline.

How audit Works

audit is built around a simple security thesis: narrow questions beat one giant prompt, and disagreement beats self-confirmation. The orchestrator reads stage prompts from prompts/, validates every stage against JSON Schemas in schemas/, and stores the state needed to move from recon to final report. The default stage split uses Opus 4.7 for higher-trust reasoning steps and Sonnet 4.6 for fanout-heavy work, so the system spends expensive reasoning only where it matters most.

audit also encodes an explicit feedback loop. Recon maps the repository and emits Hunt tasks, Hunt tries to reproduce a single attack class, Validate tries to break that result, and Trace proves whether attacker-controlled input can actually reach the sink. That feedback loop matters in codebases with repeated anti-patterns, because a reachable issue in one path can seed new Hunt tasks elsewhere without restarting the whole analysis.

audit run --repo /path/to/target --run-id my-run --max-concurrency 1 --max-cost-usd 30
audit status --run-id my-run
audit report --run-id my-run --format md > report.md

The first command starts the pipeline against a local repository, limits concurrency, and caps spend before the run can spiral. The status and report commands let you inspect stage progress and export a structured markdown result after the schemas have been validated.

Pros and Cons of audit

Pros:

Reachability-first triage cuts down on false positives that never leave the static analysis or prompt stage.
Different models for Hunt and Validate reduce confirmation bias and make it harder for one model to rubber-stamp its own output.
Schema validation keeps stage outputs machine-parseable, which is critical when you want repeatable automation instead of chat logs.
Gateway support lets you use OpenRouter or another Anthropic-compatible endpoint if you do not want to depend on the default Claude login path.
Live-target reproduction gives you a practical signal on whether a bug exists in a deployed system, not just in source code.
Budget controls help large repos avoid runaway agent fanout and make the tool usable in real-world CI or internal audit windows.

Cons:

Claude-compatible auth is mandatory for the primary workflow, so this is not a drop-in tool for teams that banned that ecosystem.
Non-Claude models may be less reliable at producing schema-compliant JSON, which means gateway freedom can trade off against output quality.
Large repositories can get expensive fast because the default pattern is to fan out into many Hunt tasks before the dedupe and trace stages collapse them.
Live-target mode narrows egress to one host and localhost, which is correct for safety but can be restrictive during broader integration testing.
The pipeline is opinionated around security discovery, so it is not the right choice if you want a generic agent framework for unrelated automation.

Getting Started with audit

python -m venv .venv && source .venv/bin/activate
pip install -e .
claude login
audit auth-check
audit run --repo /path/to/target --run-id demo

If you prefer headless auth for CI, replace claude login with claude setup-token and export CLAUDE_CODE_OAUTH_TOKEN in your environment. After the first run, audit will create a run directory with stage artifacts, schema-validated outputs, and a report that you can inspect or export as markdown.

Verdict

audit is the strongest option for AppSec teams that want an agentic vulnerability-discovery pipeline when they already have access to Claude-compatible auth or a gateway. Its main strength is the combination of narrow agents, adversarial validation, and reachability tracing; the main caveat is cost and setup complexity on large repos. Use it when false positives are expensive and you need proof, not guesses.

audit: Best AI Vulnerability Discovery Agents for AppSec in 2026

What Is audit?

Quick Overview

Who Should Use audit?

Key Features of audit

audit vs Alternatives

How audit Works

Pros and Cons of audit

Getting Started with audit

Verdict

Frequently Asked Questions

You Might Also Like

Papr: Best Desktop RSS Reader for Power Users in 2026

ai-memory: Best AI Coding Agent Memory for Developers in 2026

WeChat Radar Review: Local-First Alternative to WeChat Search