Skip to main content

2. View Red Teaming Results (View Results)

Overview

Once an evaluation is completed, you can visualize overall results in the Dashboard and analyze individual outcomes in the Attack Set detail view.

This document explains how to read and explore results from the Dashboard after an Auto Red Teaming or Benchmark execution has completed.

Through this tutorial, you will be able to:

  • Compare Attack Success Rates (ASR) across multiple target models at a glance
  • Select a specific model to analyze detailed metrics and vulnerable areas
  • Filter results by Attack Set to improve comparison accuracy
  • Inspect actual attack–response results at the individual Seed level

Dashboard Overview — ASR by Target Model

After evaluation is complete, navigate to Task Detail → Dashboard.

The Dashboard aggregates results at the Attack Set level, allowing comparison of multiple target models evaluated using the same Attack Set.

The first section displayed is the Attack Success Rate (ASR) by Target Model chart, which provides a quick, high-level view of overall results.

💡 If no Attack Set is selected
All relevant results executed within the current workspace are summarized and displayed.


1. Filters and Chart


① Select Attack Set

Use the top filter to select the Attack Set you want to analyze.

② ASR by Model Chart

The Attack Success Rate for each model included in the selected Attack Set is displayed as a chart.
Clicking a model in the chart reveals detailed metrics for that model in the section below.


2. Run Summary for Selected Model


MetricDescription
Attack Success Rate (ASR)Percentage of successful attacks
Average Runs per SeedAverage number of iterations per Seed
Average Attack ScoreAverage attack score
Total CostTotal execution cost
Total Run TimeTotal execution time

3. Model-Level Metrics — Judge / Scorer Tabs

Below the Run Summary, metrics are further broken down by Risk Taxonomy and Strategy.
By default, the Judge tab is selected, but you can switch to the Scorer tab as needed.

Judge vs. Scorer — Which should you use?
  • Judge tab: Best for quickly understanding overall attack success rates based on binary outcomes—whether an attack succeeded or failed.
  • Scorer tab: Used to assess the degree of harmfulness in model responses. It enables analysis of the harmfulness spectrum that cannot be captured by ASR alone.

① Judge Tab

  • Attack Success Rate by Risk Taxonomy: ASR grouped by risk category
  • Attack Success Rate by Strategy: ASR grouped by attack strategy

② Scorer Tab

  • Safety Score by Risk Taxonomy: Safety scores grouped by risk category
  • Safety Score by Strategy: Safety scores grouped by attack strategy

Scores in the Scorer tab range from 0 to 10, with the following interpretations:

Score RangeJudgmentMeaning
10, 9, 8Highly UnsafeClearly generates harmful responses
7, 6, 5Moderately UnsafePartially harmful or borderline responses
4, 3Moderately SafeGenerally safe but with room for improvement
2, 1, 0Refusal or Fully SafeThe request was refused or the response was fully safe

③ Expand Sub-fields

Clicking a row for a Risk Taxonomy category or Strategy expands detailed charts for sub-fields within that category.
Click Expand All at the top to expand all sub-fields at once.


Attack Set Detail

Click an item in the Attack Set list to navigate to the detailed, Seed-level result view.


1. Header Section


① Left-side Information

  • Target Model
  • Sampling Method
  • Count per Risk Taxonomy

② Right-side Metrics

  • Total (Safe / Unsafe ratio)
  • Average Loop Count
  • Max Red Teaming Runs
  • Unsafety Count by Strategy

2. Risk Taxonomy Tabs

Results are organized into tabs by Risk Taxonomy category.


ColumnDescription
IDSeed identifier
SeedOriginal Seed
ResultSafe / Unsafe
High-risk QueryGenerated attack prompt
AnswerModel response
StrategyAttack strategy used
LoopNumber of iterations

3. Data Detail Panel

Clicking a Seed row opens a detailed view panel on the right.

  • Seed: Original Seed
  • High-risk Query: Generated attack prompt (when judged Unsafe)
  • Defense Successes: History of successful defenses (Safe attempts)
Using Defense Successes

Even if a Seed ultimately results in an Unsafe judgment, the model may have successfully defended against earlier attempts.
Reviewing these histories helps identify which variations the model could defend against and where the defense ultimately failed.