Skip to main content

Interactive Evaluation Guide

Interactive Evaluation

Interactive Evaluation is a workflow where workers chat with an AI model in real-time to assess its response quality. When a Worker submits an evaluation, a Reviewer reviews it.

The overall flow is as follows:

  1. (Reviewer) Create Task → Distribute Link
  2. (Worker) Evaluate conversation → Ground Truth (optional) → Write submission reason & Submit
  3. (Reviewer) Review submission → Evaluate with 👍/👎 based on validation criteria

Worker — Conducting the Evaluation

Workers directly chat with the model to evaluate and submit the quality of its responses.

Step 1. Proceed with Evaluation and Submission

Click the provided Task link → Enter your account details on the login screen and log in.

② Start a Conversation with the Model

Write a question in the input box at the bottom → Check the model's response → If necessary, continue the conversation with additional questions.

Tips for Effective Conversation
  • Ask from various angles: Evaluate the model's response quality from multiple perspectives.
  • Maintain context: Test consistency by asking questions that refer to the previous conversation.
  • Sufficient conversation: Continue the conversation as much as needed for the evaluation.

③ Input Ground Truth (Optional)

If necessary, press the Input Ground Truth button to write the ideal expected response.

  • After writing, you must enter the reason for submission in the Submission Basis to activate the Submit button.
  • When you press Submit, the conversation, Ground Truth (if entered), and submission reason are automatically saved.
  • There is no separate save button for Ground Truth; it is saved along with the submission.
Check Before Submission
  • Has the conversation progressed sufficiently?
  • Is the submission reason clearly written?

④ Write Submission Reason

Write the reason for submission in the Submission Basis area on the right.

⑤ Submit

Click the Submit button → The following items are saved together:

  • Conversation history (all questions and responses)
  • Ground Truth (if entered)
  • Submission reason (Submission Basis)
Checklist Before Submission
  • Has the conversation progressed sufficiently?
  • Is the submission reason clearly written?
  • Have you double-checked everything?

FAQ — Worker

Q: Do I have to input the Ground Truth? A: No, it's optional. You can submit without it.

Q: Can I edit my submission? A: You cannot edit after submission. Please review carefully before submitting.


Reviewer — Task Management and Review

The reviewer manages the entire evaluation process, from task creation to review.

Step 1. Create and Distribute Task

① Enter Interactive Evaluation Task Creation

Click the [+ New Task] button at the top right of the [Interactive Evaluation] page to start a new evaluation task.

② Enter Task Information

Enter the basic information for the Interactive Evaluation Task.

  • Enter Task Name, Target Model, Validation criteria, then Complete.

Copy the participation link (URL) of the created Task and share it with the workers.

  • Workers who receive the link can log in immediately and start the evaluation.

Step 2. Check Task Details and Proceed with Review

① Enter Task Detail

Click [Interactive Evaluation] in the left menu → Select the Task to review from the Task list.

② Check the List of Submissions

On the Task Detail screen, check the list of conversation histories submitted by workers.

③ Review Detailed Content

Click on a submission to check the detailed content.

Review Items:

  • Query / Model Response / Ground Truth (optional) / Submission Reason
  • Select 👍 Approve / 👎 Reject based on whether it meets the validation criteria.
  • If necessary, edit the Ground Truth to modify/create it.

FAQ — Reviewer

Q: Can I edit the Task after creation? A: You can edit the Task Name and Description, but not the Target Model and Validation criteria. Please set them carefully.

Q: How do I evaluate a submission without a Ground Truth? A: Since Ground Truth is optional, you can evaluate based on the conversation quality and submission reason alone.


Key Features Summary
  • Worker: Directly chats with the model, writes Ground Truth (optional), enters submission reason.
  • Reviewer: Creates and distributes Tasks, reviews submissions, evaluates with 👍/👎 based on Validation criteria.
  • Use Cases: Red Teaming, safety verification, bias evaluation, etc.