Datumo Safety
Identify safety risks in AI systems through automated red teaming
Auto Red Teaming is a red teaming workflow designed to automatically evaluate safety risks in Large Language Models (LLMs) and AI systems.
Based on Benchmark Datasets (Seeds) and Risk Taxonomy, it generates attack prompts and systematically evaluates which risks a target model is vulnerable to and where its defenses fail, using consistent criteria and quantitative metrics.
This section guides you through the complete workflow of using Auto Red Teaming—from creating evaluation Tasks, running evaluations, to analyzing results.
How is Auto Red Teaming structured?
-
Benchmark Datasets are libraries of attack simulation Seeds organized according to the Risk Taxonomy.
These datasets are read-only and are selected for use during red teaming execution. -
Auto Red Teaming automatically applies and iterates through diverse attack strategies based on the selected Benchmark Seeds to explore the defensive limits and vulnerability surfaces of the target model.
Where can it be used?
- Safety validation before LLM model releases
- Before-and-after comparison of prompt or policy changes
- Ongoing risk assessment of AI systems in production
How does AI Red Teaming work?
Create an Evaluation Task → Configure an Attack Set → Run automated attack simulations → Analyze results in the Dashboard
Next Steps
We recommend using Auto Red Teaming in the following order:
-
Review the Benchmark Dataset
Examine the Seeds and Risk Taxonomy structure used for attack simulations. -
Create and Run an Evaluation Task
Select a target model, configure an Attack Set, and execute automated red teaming. -
Analyze Results
Use metrics such as ASR and Score in the Dashboard to identify safety vulnerabilities in the model.
Select one of the documents below to get started.