2. View Red Teaming Results (View Results)
Once an evaluation is completed, you can visualize overall results in the Dashboard and analyze individual outcomes in the Attack Set detail view.
This document explains how to read and explore results from the Dashboard after an Auto Red Teaming or Benchmark execution has completed.
Through this tutorial, you will be able to:
- Compare Attack Success Rates (ASR) across multiple target models at a glance
- Select a specific model to analyze detailed metrics and vulnerable areas
- Filter results by Attack Set to improve comparison accuracy
- Inspect actual attack–response results at the individual Seed level
Dashboard Overview — ASR by Target Model
After evaluation is complete, navigate to Task Detail → Dashboard.
The Dashboard aggregates results at the Attack Set level, allowing comparison of multiple target models evaluated using the same Attack Set.
The first section displayed is the Attack Success Rate (ASR) by Target Model chart, which provides a quick, high-level view of overall results.
💡 If no Attack Set is selected
All relevant results executed within the current workspace are summarized and displayed.
1. Filters and Chart
① Select Attack Set
Use the top filter to select the Attack Set you want to analyze.
② ASR by Model Chart
The Attack Success Rate for each model included in the selected Attack Set is displayed as a chart.
Clicking a model in the chart reveals detailed metrics for that model in the section below.
2. Run Summary for Selected Model
| Metric | Description |
|---|---|
| Attack Success Rate (ASR) | Percentage of successful attacks |
| Average Runs per Seed | Average number of iterations per Seed |
| Average Attack Score | Average attack score |
| Total Cost | Total execution cost |
| Total Run Time | Total execution time |
3. Model-Level Metrics — Judge / Scorer Tabs
Below the Run Summary, metrics are further broken down by Risk Taxonomy and Strategy.
By default, the Judge tab is selected, but you can switch to the Scorer tab as needed.
- Judge tab: Best for quickly understanding overall attack success rates based on binary outcomes—whether an attack succeeded or failed.
- Scorer tab: Used to assess the degree of harmfulness in model responses. It enables analysis of the harmfulness spectrum that cannot be captured by ASR alone.
① Judge Tab
- Attack Success Rate by Risk Taxonomy: ASR grouped by risk category
- Attack Success Rate by Strategy: ASR grouped by attack strategy
② Scorer Tab
- Safety Score by Risk Taxonomy: Safety scores grouped by risk category
- Safety Score by Strategy: Safety scores grouped by attack strategy
Scores in the Scorer tab range from 0 to 10, with the following interpretations:
| Score Range | Judgment | Meaning |
|---|---|---|
| 10, 9, 8 | Highly Unsafe | Clearly generates harmful responses |
| 7, 6, 5 | Moderately Unsafe | Partially harmful or borderline responses |
| 4, 3 | Moderately Safe | Generally safe but with room for improvement |
| 2, 1, 0 | Refusal or Fully Safe | The request was refused or the response was fully safe |
③ Expand Sub-fields
Clicking a row for a Risk Taxonomy category or Strategy expands detailed charts for sub-fields within that category.
Click Expand All at the top to expand all sub-fields at once.
Attack Set Detail
Click an item in the Attack Set list to navigate to the detailed, Seed-level result view.
1. Header Section
① Left-side Information
- Target Model
- Sampling Method
- Count per Risk Taxonomy
② Right-side Metrics
- Total (Safe / Unsafe ratio)
- Average Loop Count
- Max Red Teaming Runs
- Unsafety Count by Strategy
2. Risk Taxonomy Tabs
Results are organized into tabs by Risk Taxonomy category.
| Column | Description |
|---|---|
| ID | Seed identifier |
| Seed | Original Seed |
| Result | Safe / Unsafe |
| High-risk Query | Generated attack prompt |
| Answer | Model response |
| Strategy | Attack strategy used |
| Loop | Number of iterations |
3. Data Detail Panel
Clicking a Seed row opens a detailed view panel on the right.
- Seed: Original Seed
- High-risk Query: Generated attack prompt (when judged Unsafe)
- Defense Successes: History of successful defenses (Safe attempts)
Even if a Seed ultimately results in an Unsafe judgment, the model may have successfully defended against earlier attempts.
Reviewing these histories helps identify which variations the model could defend against and where the defense ultimately failed.