Skip to main content

Datumo Safety

Identify safety risks in AI systems through automated red teaming

Auto Red Teaming is a red teaming workflow designed to automatically evaluate safety risks in Large Language Models (LLMs) and AI systems.
Based on Benchmark Datasets (Seeds) and Risk Taxonomy, it generates attack prompts and systematically evaluates which risks a target model is vulnerable to and where its defenses fail, using consistent criteria and quantitative metrics.

This section guides you through the complete workflow of using Auto Red Teaming—from creating evaluation Tasks, running evaluations, to analyzing results.


How is Auto Red Teaming structured?

  • Benchmark Datasets are libraries of attack simulation Seeds organized according to the Risk Taxonomy.
    These datasets are read-only and are selected for use during red teaming execution.

  • Auto Red Teaming automatically applies and iterates through diverse attack strategies based on the selected Benchmark Seeds to explore the defensive limits and vulnerability surfaces of the target model.


Where can it be used?

  • Safety validation before LLM model releases
  • Before-and-after comparison of prompt or policy changes
  • Ongoing risk assessment of AI systems in production

How does AI Red Teaming work?

Create an Evaluation Task → Configure an Attack Set → Run automated attack simulations → Analyze results in the Dashboard


Next Steps

We recommend using Auto Red Teaming in the following order:

  1. Review the Benchmark Dataset
    Examine the Seeds and Risk Taxonomy structure used for attack simulations.

  2. Create and Run an Evaluation Task
    Select a target model, configure an Attack Set, and execute automated red teaming.

  3. Analyze Results
    Use metrics such as ASR and Score in the Dashboard to identify safety vulnerabilities in the model.

Select one of the documents below to get started.