Interactive Evaluation Guide
Interactive Evaluation is a workflow where workers chat with an AI model in real-time to assess its response quality. When a Worker submits an evaluation, a Reviewer reviews it.
The overall flow is as follows:
- (Reviewer) Create Task → Distribute Link
- (Worker) Evaluate conversation → Ground Truth (optional) → Write submission reason & Submit
- (Reviewer) Review submission → Evaluate with 👍/👎 based on validation criteria
Worker — Conducting the Evaluation
Workers directly chat with the model to evaluate and submit the quality of its responses.
Step 1. Proceed with Evaluation and Submission
① Access Task Link and Log In
Click the provided Task link → Enter your account details on the login screen and log in.
② Start a Conversation with the Model
Write a question in the input box at the bottom → Check the model's response → If necessary, continue the conversation with additional questions.
- Ask from various angles: Evaluate the model's response quality from multiple perspectives.
- Maintain context: Test consistency by asking questions that refer to the previous conversation.
- Sufficient conversation: Continue the conversation as much as needed for the evaluation.
③ Input Ground Truth (Optional)
If necessary, press the Input Ground Truth button to write the ideal expected response.
- After writing, you must enter the reason for submission in the Submission Basis to activate the Submit button.
- When you press Submit, the conversation, Ground Truth (if entered), and submission reason are automatically saved.
- There is no separate save button for Ground Truth; it is saved along with the submission.
- Has the conversation progressed sufficiently?
- Is the submission reason clearly written?
④ Write Submission Reason
Write the reason for submission in the Submission Basis area on the right.
⑤ Submit
Click the Submit button → The following items are saved together:
- Conversation history (all questions and responses)
- Ground Truth (if entered)
- Submission reason (Submission Basis)
- Has the conversation progressed sufficiently?
- Is the submission reason clearly written?
- Have you double-checked everything?
FAQ — Worker
Q: Do I have to input the Ground Truth? A: No, it's optional. You can submit without it.
Q: Can I edit my submission? A: You cannot edit after submission. Please review carefully before submitting.
Reviewer — Task Management and Review
The reviewer manages the entire evaluation process, from task creation to review.
Step 1. Create and Distribute Task
① Enter Interactive Evaluation Task Creation
Click the [+ New Task] button at the top right of the [Interactive Evaluation] page to start a new evaluation task.
② Enter Task Information
Enter the basic information for the Interactive Evaluation Task.
- Enter Task Name, Target Model, Validation criteria, then Complete.
③ Share the Link with Workers
Copy the participation link (URL) of the created Task and share it with the workers.
- Workers who receive the link can log in immediately and start the evaluation.
Step 2. Check Task Details and Proceed with Review
① Enter Task Detail
Click [Interactive Evaluation] in the left menu → Select the Task to review from the Task list.
② Check the List of Submissions
On the Task Detail screen, check the list of conversation histories submitted by workers.
③ Review Detailed Content
Click on a submission to check the detailed content.
Review Items:
- Query / Model Response / Ground Truth (optional) / Submission Reason
- Select 👍 Approve / 👎 Reject based on whether it meets the validation criteria.
- If necessary, edit the Ground Truth to modify/create it.
FAQ — Reviewer
Q: Can I edit the Task after creation? A: You can edit the Task Name and Description, but not the Target Model and Validation criteria. Please set them carefully.
Q: How do I evaluate a submission without a Ground Truth? A: Since Ground Truth is optional, you can evaluate based on the conversation quality and submission reason alone.
- Worker: Directly chats with the model, writes Ground Truth (optional), enters submission reason.
- Reviewer: Creates and distributes Tasks, reviews submissions, evaluates with 👍/👎 based on Validation criteria.
- Use Cases: Red Teaming, safety verification, bias evaluation, etc.