Skip to main content

Automated Red Teaming

Auto-Redteaming is a fully automated red teaming system that generates adversarial prompts based on seed inputs to evaluate the safety and vulnerabilities of AI models.

It combines various strategies to generate adversarial prompts, and uses a scoring model (Judge) to automatically evaluate model responses and generate quantitative reports.




β‘  Create Auto-Redteaming Project​

Step 1. Access the Feature Page​

Click the [Auto-Redteaming] menu from the left navigation bar.

If no project has been created yet, you'll need to create one before using the feature.
Click the [+ Add Project] button on the top right.


Step 2. Add New Project​

⚠️ The Auto-Redteaming evaluation starts automatically as soon as the project is created.
Make sure all settings and seed file are correct. Projects cannot be edited or deleted after creation.


Click the [Add Red Teaming Project] button and complete the form:

  • Upload File: Upload a file containing seed prompts for evaluation
  • Target Model: Select the model to be tested
  • Max Red Teaming Runs: Set the number of attack iterations per seed
  • Select Taxonomy: Choose the offensive strategy taxonomy

Step 3. Start Evaluation​

After completing the form, click [Add Red Teaming Project] to start evaluation.
The project will appear in the list, and evaluation will begin automatically in the background.


β‘‘ Red Teaming Execution and Results​

Step 4. Evaluation in Progress​

Once started, the project status will be shown as β€œIn Progress” in the project list.

Inside the project page, you'll see a progress bar for each seed. The system automatically runs:

  1. Strategy generation
  2. Prompt injection to the model
  3. Judge evaluation of the response


Step 5. View Report After Completion​

When evaluation is complete, the project dashboard displays a summary report.

It includes core performance metrics that help you understand the model’s defensive capabilities at a glance.

Example Report Items:

  • Model name, number of runs, Safe/Unsafe outcomes
  • Strategy-wise breakdown of vulnerabilities

πŸ“Œ These reports are useful for tracking safety trends and identifying weaknesses.


Step 6. View Detailed Results​

The bottom of the report page shows a detailed evaluation table per seed.

You’ll see:

  • Number of attack attempts
  • Evaluation scores
  • Strategies used, etc.

πŸ“Œ Focus on seeds that frequently resulted in Unsafe responses for deeper analysis or mitigation planning.