Automated Red Teaming
Auto-Redteaming is a fully automated red teaming system that generates adversarial prompts based on seed inputs to evaluate the safety and vulnerabilities of AI models.
It combines various strategies to generate adversarial prompts, and uses a scoring model (Judge) to automatically evaluate model responses and generate quantitative reports.
β Create Auto-Redteaming Projectβ
Step 1. Access the Feature Pageβ
Click the [Auto-Redteaming] menu from the left navigation bar.
If no project has been created yet, you'll need to create one before using the feature.
Click the [+ Add Project] button on the top right.
Step 2. Add New Projectβ
β οΈ The Auto-Redteaming evaluation starts automatically as soon as the project is created.
Make sure all settings and seed file are correct. Projects cannot be edited or deleted after creation.
Click the [Add Red Teaming Project] button and complete the form:
- Upload File: Upload a file containing seed prompts for evaluation
- Target Model: Select the model to be tested
- Max Red Teaming Runs: Set the number of attack iterations per seed
- Select Taxonomy: Choose the offensive strategy taxonomy
Step 3. Start Evaluationβ
After completing the form, click [Add Red Teaming Project] to start evaluation.
The project will appear in the list, and evaluation will begin automatically in the background.
β‘ Red Teaming Execution and Resultsβ
Step 4. Evaluation in Progressβ
Once started, the project status will be shown as βIn Progressβ in the project list.
Inside the project page, you'll see a progress bar for each seed. The system automatically runs:
- Strategy generation
- Prompt injection to the model
- Judge evaluation of the response
Step 5. View Report After Completionβ
When evaluation is complete, the project dashboard displays a summary report.
It includes core performance metrics that help you understand the modelβs defensive capabilities at a glance.
Example Report Items:
- Model name, number of runs, Safe/Unsafe outcomes
- Strategy-wise breakdown of vulnerabilities
π These reports are useful for tracking safety trends and identifying weaknesses.
Step 6. View Detailed Resultsβ
The bottom of the report page shows a detailed evaluation table per seed.
Youβll see:
- Number of attack attempts
- Evaluation scores
- Strategies used, etc.
π Focus on seeds that frequently resulted in Unsafe responses for deeper analysis or mitigation planning.
