Automated Red Teaming

Auto-Redteaming is a fully automated red teaming system that generates adversarial prompts based on seed inputs to evaluate the safety and vulnerabilities of AI models.

It combines various strategies to generate adversarial prompts, and uses a scoring model (Judge) to automatically evaluate model responses and generate quantitative reports.

① Create Auto-Redteaming Project

Step 1. Access the Feature Page

Click the [Auto-Redteaming] menu from the left navigation bar.

If no project has been created yet, you'll need to create one before using the feature.
Click the [+ Add Project] button on the top right.

Step 2. Add New Project

⚠️ The Auto-Redteaming evaluation starts automatically as soon as the project is created.
Make sure all settings and seed file are correct. Projects cannot be edited or deleted after creation.

Click the [Add Red Teaming Project] button and complete the form:

Upload File: Upload a file containing seed prompts for evaluation
Target Model: Select the model to be tested
Max Red Teaming Runs: Set the number of attack iterations per seed
Select Taxonomy: Choose the offensive strategy taxonomy

Step 3. Start Evaluation

After completing the form, click [Add Red Teaming Project] to start evaluation.
The project will appear in the list, and evaluation will begin automatically in the background.

② Red Teaming Execution and Results

Step 4. Evaluation in Progress

Once started, the project status will be shown as “In Progress” in the project list.

Inside the project page, you'll see a progress bar for each seed. The system automatically runs:

Strategy generation
Prompt injection to the model
Judge evaluation of the response

Step 5. View Report After Completion

When evaluation is complete, the project dashboard displays a summary report.

It includes core performance metrics that help you understand the model’s defensive capabilities at a glance.

Example Report Items:

Model name, number of runs, Safe/Unsafe outcomes
Strategy-wise breakdown of vulnerabilities

📌 These reports are useful for tracking safety trends and identifying weaknesses.

Step 6. View Detailed Results

The bottom of the report page shows a detailed evaluation table per seed.

You’ll see:

Number of attack attempts
Evaluation scores
Strategies used, etc.

📌 Focus on seeds that frequently resulted in Unsafe responses for deeper analysis or mitigation planning.

① Create Auto-Redteaming Project​

Step 1. Access the Feature Page​

Step 2. Add New Project​

Step 3. Start Evaluation​

② Red Teaming Execution and Results​

Step 4. Evaluation in Progress​

Step 5. View Report After Completion​

Step 6. View Detailed Results​