Interactive Evaluation Guide

Interactive Evaluation

Interactive Evaluation is a workflow where workers chat with an AI model in real-time to assess its response quality. When a Worker submits an evaluation, a Reviewer reviews it.

The overall flow is as follows:

(Reviewer) Create Task → Distribute Link
(Worker) Evaluate conversation → Ground Truth (optional) → Write submission reason & Submit
(Reviewer) Review submission → Evaluate with 👍/👎 based on validation criteria

Worker — Conducting the Evaluation

Workers directly chat with the model to evaluate and submit the quality of its responses.

Step 1. Proceed with Evaluation and Submission

① Access Task Link and Log In

Click the provided Task link → Enter your account details on the login screen and log in.

② Start a Conversation with the Model

Write a question in the input box at the bottom → Check the model's response → If necessary, continue the conversation with additional questions.

Tips for Effective Conversation

Ask from various angles: Evaluate the model's response quality from multiple perspectives.
Maintain context: Test consistency by asking questions that refer to the previous conversation.
Sufficient conversation: Continue the conversation as much as needed for the evaluation.

③ Input Ground Truth (Optional)

If necessary, press the Input Ground Truth button to write the ideal expected response.

After writing, you must enter the reason for submission in the Submission Basis to activate the Submit button.
When you press Submit, the conversation, Ground Truth (if entered), and submission reason are automatically saved.
There is no separate save button for Ground Truth; it is saved along with the submission.

Check Before Submission

Has the conversation progressed sufficiently?
Is the submission reason clearly written?

④ Write Submission Reason

Write the reason for submission in the Submission Basis area on the right.

⑤ Submit

Click the Submit button → The following items are saved together:

Conversation history (all questions and responses)
Ground Truth (if entered)
Submission reason (Submission Basis)

Checklist Before Submission

Has the conversation progressed sufficiently?
Is the submission reason clearly written?
Have you double-checked everything?

FAQ — Worker

Q: Do I have to input the Ground Truth? A: No, it's optional. You can submit without it.

Q: Can I edit my submission? A: You cannot edit after submission. Please review carefully before submitting.

Reviewer — Task Management and Review

The reviewer manages the entire evaluation process, from task creation to review.

Step 1. Create and Distribute Task

① Enter Interactive Evaluation Task Creation

Click the [+ New Task] button at the top right of the [Interactive Evaluation] page to start a new evaluation task.

② Enter Task Information

Enter the basic information for the Interactive Evaluation Task.

Enter Task Name, Target Model, Validation criteria, then Complete.

Copy the participation link (URL) of the created Task and share it with the workers.

Workers who receive the link can log in immediately and start the evaluation.

Step 2. Check Task Details and Proceed with Review

① Enter Task Detail

Click [Interactive Evaluation] in the left menu → Select the Task to review from the Task list.

② Check the List of Submissions

On the Task Detail screen, check the list of conversation histories submitted by workers.

③ Review Detailed Content

Click on a submission to check the detailed content.

Review Items:

Query / Model Response / Ground Truth (optional) / Submission Reason
Select 👍 Approve / 👎 Reject based on whether it meets the validation criteria.
If necessary, edit the Ground Truth to modify/create it.

FAQ — Reviewer

Q: Can I edit the Task after creation? A: You can edit the Task Name and Description, but not the Target Model and Validation criteria. Please set them carefully.

Q: How do I evaluate a submission without a Ground Truth? A: Since Ground Truth is optional, you can evaluate based on the conversation quality and submission reason alone.

Key Features Summary

Worker: Directly chats with the model, writes Ground Truth (optional), enters submission reason.
Reviewer: Creates and distributes Tasks, reviews submissions, evaluates with 👍/👎 based on Validation criteria.
Use Cases: Red Teaming, safety verification, bias evaluation, etc.

Worker — Conducting the Evaluation​

Step 1. Proceed with Evaluation and Submission​

① Access Task Link and Log In​

② Start a Conversation with the Model​

③ Input Ground Truth (Optional)​

④ Write Submission Reason​

⑤ Submit​

FAQ — Worker​

Reviewer — Task Management and Review​

Step 1. Create and Distribute Task​

① Enter Interactive Evaluation Task Creation​

② Enter Task Information​

③ Share the Link with Workers​

Step 2. Check Task Details and Proceed with Review​

① Enter Task Detail​

② Check the List of Submissions​

③ Review Detailed Content​

FAQ — Reviewer​