Skip to main content

Automated Red Teaming

Auto-Redteaming is an automated red team system that automatically generates attack prompts based on seed sentences to evaluate the safety and vulnerability of a model.

It attempts attacks by combining multiple strategies and provides a report by having a Scorer quantitatively evaluate the responses.


① Auto-Redteaming TASK Creation Procedure

Step 1. Enter Page and Create Task

  1. Go to the Auto-Redteaming feature page.
    This is a pre-registered list screen. Auto-Redteaming is performed immediately after TASK creation.

  2. If there are no registered TASKs, click + New Task at the top to create a new TASK.


Step 2. Evaluation Settings

  1. Download the Seed Only template and write the seed sentences (queries) to be used for the evaluation.
  2. Upload the created seed file.
  3. Select the Target Model / Agent to be evaluated.
  4. Set the number of repeated attacks per seed (Max Red Teaming Runs).
  5. Select the attack strategy classification (Taxonomy).

After entering all fields, click Add Red Teaming Task to create the project.

  • Upload File: Upload the evaluation seed file
  • Target Model: Select the LLM to be evaluated
  • Max Red Teaming Runs: Set the number of repetitions per seed
  • Select Taxonomy: Select the strategy classification system

Step 3. Create and Run TASK

The evaluation is performed at the same time as the TASK is created by clicking Add Red Teaming Task.


② Check Results

Step 4. Evaluation in Progress

  • The TASK status is displayed as Red Teaming.
  • Click View Dataset / Progress to check the progress and real-time logs (sampling/attempt status).


Step 5. Complete Evaluation and Check Report

  • After the evaluation is complete, you will automatically receive a statistics-based report on the TASK details screen.
  • The report includes a summary of overall performance and key evaluation indicators, allowing you to intuitively grasp the model's defense capabilities as Safe and Unsafe.

Example of major report items:

  • Model name, number of repetitions, Safe/Unsafe ratio
  • Summary of vulnerabilities by strategy/category

📌 The report is used to analyze the model's safety trends and identify vulnerable areas.


Step 6. Check Detailed Results

  • Click on each seed sentence to check the individual evaluation logs (repeated attempts, evaluation scores, strategies used, Target responses, etc.).
  • Prioritize analyzing seeds that have been repeatedly judged as Unsafe to prepare response strategies or use them as model improvement points.

📌 You can establish response strategies centered on seed sentences that have been repeatedly judged as Unsafe, or use them to improve performance for each problem type.