Skip to main content

Datumo Safety

Identify safety risks in AI systems through automated red teaming

The Auto Red Teaming page provides an automated red teaming workflow for evaluating safety risks in Large Language Models (LLMs). By applying various Red-Teaming strategies to Seeds from Benchmark Datasets, it generates attack prompts and allows you to identify which risks a Target Model is vulnerable to, which Risk Taxonomy categories are affected, and which strategy groups trigger Jailbreaks—all through consistent criteria and quantitative metrics.

This section guides you through the complete workflow of using Auto Red Teaming—from creating a Red Teaming evaluation Task, running evaluations, to analyzing results.


Key Terminology for Auto Red Teaming

  • Benchmark Dataset is a Seed library for attack simulation, organized according to the Risk Taxonomy.

    • Risk Taxonomy is a harmfulness classification system defined to verify AI model safety from multiple perspectives.
  • Auto Red Teaming is the process of automatically applying diverse Attack Strategies based on Seeds from the selected Benchmark Dataset and iteratively executing them to explore the model's defensive limits and vulnerability surfaces.

    • Attack Strategies include 16 specialized strategies derived from Selectstar's research, enabling advanced vulnerability exploration.
  • Attack Set refers to the attack items actually used in the current evaluation from the selected Benchmark Dataset. You can select all items or configure a subset through Random Sampling.

  • Target Model is the LLM Model you want to assess for vulnerabilities—the target of the attacks.

  • Jailbreak refers to cases where the Target Model produces harmful or inappropriate responses as a result of an attack.


Where can it be used?

  • Safety validation before LLM model releases
  • Risk assessment of AI systems in production

How does AI Red Teaming work?

Create an Evaluation Task → Configure an Attack Set → Run automated attack simulations → Analyze results in the Dashboard


Next Steps

We recommend using Auto Red Teaming in the following order:

  1. Select a Benchmark Dataset Review the Seeds and Risk Taxonomy structure used for attack simulations.

  2. Create and Run an Evaluation Task Select a Target Model for vulnerability analysis, configure an Attack Set, and execute automated red teaming.

  3. Analyze Results Use metrics such as ASR and Score in the Dashboard to identify safety vulnerabilities in the model.