Step 2. Run Automated Evaluation

overview

In an Evaluation Set, all evaluation settings are configured sequentially in a single modal, and the evaluation runs immediately upon completion of the Set creation.

The entire flow proceeds in a 4-step sequence:
Select Metric → Set Evaluation Model → Enter Eval Set Name → Select evaluation target dataset (Response Set).

2-1. Start Creating Eval Set

① Click the Add Evaluation Set button

After creating the Task, click the 「Add Evaluation Set」 button
The Evaluation Settings Modal will open

2-2.Eval Set Evaluation Settings

① Select Evaluation Metric

Select the Metric to be used for the evaluation:

Select Metric Category: Once selected, only Metrics from the same category can be selected.
The selected Metric determines the compatible models and Response Sets.

Only Metrics from the same category can be selected.

② Select Evaluation Model

Select an evaluation model that supports the selected Metric:

Show compatible models only: Only models compatible with the selected Metric are displayed.
Select Judge Model: GPT-4o, Claude 3.5, etc.
A notice of potential errors will be given if a non-compatible model is selected.

③ Set Name

Set a name to identify the evaluation:

Evaluation Name: Enter the name of the evaluation set.
Description: Describe the purpose or features of the evaluation (optional).

④ Select Response Set

Select a Response Set that is compatible with the selected Metric:

Show compatible Response Sets only: Only Response Sets compatible with the selected Metric are displayed.
The number of currently selected Response Sets can be checked at Selected: 0.

⑤ Run Complete

After all settings are complete:

The evaluation starts immediately upon clicking the Complete button.
You will be automatically redirected to the Eval Set Detail Page, where you can check the evaluation progress in real time.

You can manage the evaluation in the Evaluation Set.

Paused: You can pause the evaluation via the pause button and restart it from the point where it was stopped.
Error: If an error occurs during the evaluation, you can retry the evaluation for only that data. Data other than the erroneous data can be checked in the evaluation results.

2-3. Evaluation Complete

① Evaluation Complete

When the evaluation is complete, you can check it directly on the dashboard, and you can also check the overall completion rate and time taken.

💡 Evaluation Progress Tips

1. Review Dataset Quality

Be sure to check the quality of the Context and Query before evaluation.
Check that the question composition is appropriate for the purpose of the evaluation.

2. Optimize Metric Settings

Select a Metric that is appropriate for the purpose of the evaluation.
Check compatibility: Compatibility of the selected Metric with the model and Response Set.
Consider the performance and cost of the Judge Model.

3. Efficient Evaluation Execution

Run a small-scale test first, then run the full evaluation.
Adjust the batch size to balance stability and speed.
If an error occurs, immediately identify and resolve the cause.

❓ Frequently Asked Questions (FAQ)

Q. What happens if I stop during the evaluation?

A. Even if you stop during the evaluation, the results that have already been completed are preserved.
You can restart from the point where you stopped, and you can also check the partial results on the dashboard.

Q. Can I use Metrics from different categories together?

A. Once you select a Metric category, you can only add Metrics from the same category. To use a different category, you must create a new Evaluation Set.

Q. Can I edit the evaluation results?

A. After the evaluation is complete, you can manually edit the individual results. This allows a person to correct any parts that were missed by the automated evaluation or where the judgment was ambiguous.

2-1. Start Creating Eval Set​

① Click the Add Evaluation Set button​

2-2.Eval Set Evaluation Settings​

① Select Evaluation Metric​

② Select Evaluation Model​

③ Set Name​

④ Select Response Set​

⑤ Run Complete​

2-3. Evaluation Complete​

① Evaluation Complete​

💡 Evaluation Progress Tips​

1. Review Dataset Quality​

2. Optimize Metric Settings​

3. Efficient Evaluation Execution​

❓ Frequently Asked Questions (FAQ)​

Q. What happens if I stop during the evaluation?​

Q. Can I use Metrics from different categories together?​

Q. Can I edit the evaluation results?​