Skip to main content

1. Run Red Teaming Evaluation (Run Attack Set)

Overview

Once an Attack Set is created, automated red teaming evaluation is executed.
Multiple red teaming strategies are automatically applied to assess the safety of the target model.


Page Structure

Auto Red Teaming follows a Task → Attack Set → Result hierarchy.

ViewDescriptionPrimary Actions
Task ListDisplays all created Tasks and their overall statusCreate Task / Select Task
Task Detail – Dashboard TabHigh-level summary of results at the Task levelCompare results across models
Task Detail – Attack Set TabLists Attack Sets included in the TaskCreate Attack Set / Check status
Attack Set DetailExecution status and results of an individual Attack SetView results

Step 1. Create a Task

A Task acts as a container that groups multiple Attack Sets.


① Click + New Task

Click the + New Task button in the upper-right corner of the Task list.

② Enter Task Information

FieldDescription
Task Name (Required)Name of the Task (up to 255 characters)
Description (Optional)Description of the Task (up to 1,000 characters)

③ Complete

Click Complete to create the Task and return to the Task list.


Step 2. Create an Attack Set

An Attack Set is the execution unit where actual red teaming evaluation takes place.
When an Attack Set is added from the Task detail view, the evaluation starts automatically after configuration is completed.

① Open Task Detail

Click a Task row in the Task list to navigate to the Task detail page.


② Click + Add Attack Set

In the Attack Set tab, click the + Add Attack Set button.



The Attack Set creation modal is structured as follows:

  • Left panel: Select the Benchmark Dataset used for evaluation
  • Right panel: Configure evaluation settings for the Attack Set

③ Select Dataset (Left Panel)

Select the Benchmark Dataset to be used for red teaming evaluation.

  • A Dataset is a collection of Seeds organized according to the Risk Taxonomy
  • You can filter the Dataset list using search

④ Configure Settings (Right Panel)

StepFieldDescription
1Attack Set Name / DescriptionName (required), description (optional)
2Target Model / AgentTarget model(s) for evaluation (multiple selection supported)
3Max Red Teaming RunsMaximum number of attack attempts per Seed (default: 20, max: 50)
4Evaluation Sampling MethodSelect the sampling strategy

📂 Sampling Method Details
OptionDescription
Equal Sample Count per Risk TaxonomyEvaluate with an equal number of Seeds per Risk Taxonomy category
Evaluate All DataEvaluate all Seeds included in the Dataset

info

Benchmark Datasets may contain different numbers of Seeds across Risk Taxonomy categories.
The Evaluation Sampling Method determines how these differences are reflected in the evaluation.

info

When multiple models are selected,
separate Attack Sets are created per model using the same Dataset and configuration,
allowing direct comparison of evaluation results.


Step 3. Run Evaluation

Once Attack Set configuration is complete, the red teaming evaluation can be started.

① Click Complete

After completing the configuration, click Complete to open a final confirmation modal before execution.


Pre-execution Notice
  • Once evaluation starts, it cannot be paused or stopped.
  • Execution time may vary depending on the selected Dataset size and configuration.

② Click Proceed

Click Proceed to immediately start the red teaming evaluation.
Once started, the Attack Set status is shown as In Progress (Red teaming in progress).
The evaluation continues running in the background even if you leave the page.


③ Monitor Execution Status

The evaluation progress can be monitored from both the Attack Set list and the Attack Set detail page.

StatusDescription
WaitingPending execution
In ProgressRunning (progress displayed)
DoneCompleted
ErrorExecution error

Background Execution

Evaluations continue running even if you navigate away from the page or close the browser window.


Step 4. Management Controls

To ensure reproducibility and result integrity,
Auto Red Teaming provides limited management controls for Tasks and Attack Sets.


1. Task Management

A Task is a top-level unit that groups multiple Attack Sets.
Editing or deleting a Task is restricted based on the status of its Attack Sets.

① Edit Task

From the Task list, click Edit to modify the Task Name or Description.

  • Editable regardless of evaluation execution status
  • Does not affect evaluation results

② Delete Task

Deletion Conditions

Tasks can be deleted from the Task list page only when:

  • All Attack Sets within the Task are in the Done state
  • Deleting a Task removes all associated Attack Sets and evaluation results
  • Deleted data cannot be recovered

💡 If any evaluation is still in progress or incomplete, Task deletion is restricted to preserve result integrity.



2. Attack Set Management

An Attack Set is the execution unit for red teaming evaluation, and its evaluation conditions cannot be modified.

ItemEditableDescription
Name / DescriptionEditableIdentification and management purposes
DatasetNot editableEnsures evaluation reproducibility
Target Model / AgentNot editableEnsures comparison reliability
Sampling MethodNot editableMaintains result consistency
DeleteOnly when status is DonePrevents disruption during execution

💡 Attack Sets can be deleted from individual rows in Task Detail > Attack Set tab.