AIRTBench: Autonomous AI Red Teaming Agent Code

This repository contains the implementation of the AIRTBench autonomous AI red teaming agent, complementing our research paper AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models and accompanying blog post, "Do LLM Agents Have AI Red Team Capabilities? We Built a Benchmark to Find Out".

The AIRTBench agent is designed to evaluate the autonomous red teaming capabilities of large language models (LLMs) through AI/ML Capture The Flag (CTF) challenges. Our agent systematically exploits LLM-based targets by solving challenges on the Dreadnode Strikes platform, providing a standardized benchmark for measuring adversarial AI capabilities.

AIRTBench: Autonomous AI Red Teaming Agent Code

Agent Harness Construction

The AIRTBench harness follows a modular architecture designed for extensibility and evaluation:

Figure: AIRTBench harness construction architecture showing the interaction between agent components, challenge interface, and evaluation framework.

Setup

You can setup the virtual environment with uv:

uv sync

Documentation

Technical documentation for the AIRTBench agent is available in the Dreadnode Strikes documentation.

Run the Evaluation

In order to run the code, you will need access to the Dreadnode strikes platform, see the docs or submit for the Strikes waitlist here.

This rigging-based agent works to solve a variety of AI ML CTF challenges from the dreadnode Crucible platform and given access to execute python commands on a network-local container with custom Dockerfile.

uv run -m airtbench --help

Basic Usage

uv run -m airtbench --model $MODEL --project $PROJECT --platform-api-key $DREADNODE_TOKEN --token $DREADNODE_TOKEN --server https://platform.dreadnode.io --max-steps 100 --inference_timeout 240 --enable-cache --no-give-up --challenges bear1 bear2

Challenge Filtering

To run the agent against challenges that match the is_llm:true criteria, which are LLM-based challenges, you can use the following command:

uv run -m airtbench --model <model> --llm-challenges-only

The harness will automatically build the defined number of containers with the supplied flag, and load them as needed to ensure they are network-isolated from each other. The process is generally:

For each challenge, produce the agent with the Juypter notebook given in the challenge
Task the agent with solving the CTF challenge based on notebook contents
Bring up the associated environment
Test the agents ability to execute python code, and run inside a Juypter kernel in which the response is fed back to the model
If the CTF challenge is solved and flag is observed, the agent must submit the flag
Otherwise run until an error, give up, or max-steps is reached

Check out the challenge manifest to see current challenges in scope.

Resources

Dataset

Download the dataset directly from 🤗Hugging Face
Instructions for loading the dataset can be found in the dataset directory also.

Citation

If you find our work helpful, please use the following citations.

@misc{dawson2025airtbenchmeasuringautonomousai,
      title={AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models},
      author={Ads Dawson and Rob Mulla and Nick Landers and Shane Caldwell},
      year={2025},
      eprint={2506.14682},
      archivePrefix={arXiv},
      primaryClass={cs.CR},
      url={https://arxiv.org/abs/2506.14682},
}

Model requests

If you know of a model that may be interesting to analyze, but do not have the resources to run it yourself, feel free to open a feature request via a GitHub issue.

🤝 Contributing

Forks and contributions are welcome! Please see our Contributing Guide.

🔐 Security

See our Security Policy for reporting vulnerabilities.

⭐ Star History

By watching the repo, you can also be notified of any upcoming releases.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.github		.github
.hooks		.hooks
.vscode		.vscode
airtbench		airtbench
assets		assets
dataset		dataset
docs		docs
notebooks		notebooks
runs		runs
.editorconfig		.editorconfig
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.secrets.baseline		.secrets.baseline
CODEOWNERS		CODEOWNERS
LICENSE		LICENSE
README.md		README.md
RENOVATE_TESTING.md		RENOVATE_TESTING.md
SECURITY.md		SECURITY.md
Taskfile.yaml		Taskfile.yaml
pyproject.toml		pyproject.toml
python.code-workspace		python.code-workspace
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AIRTBench: Autonomous AI Red Teaming Agent Code

Agent Harness Construction

Setup

Documentation

Run the Evaluation

Basic Usage

Challenge Filtering

Resources

Dataset

Citation

Model requests

🤝 Contributing

🔐 Security

⭐ Star History

About

Uh oh!

Releases 1

Uh oh!

Contributors 2

Languages

License

dreadnode/AIRTBench-Code

Folders and files

Latest commit

History

Repository files navigation

AIRTBench: Autonomous AI Red Teaming Agent Code

Agent Harness Construction

Setup

Documentation

Run the Evaluation

Basic Usage

Challenge Filtering

Resources

Dataset

Citation

Model requests

🤝 Contributing

🔐 Security

⭐ Star History

About

Topics

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Uh oh!

Contributors 2

Languages