Skip to main content

Human Evaluation Overview

A method where humans directly evaluate AI responses, allowing for intuitive verification of the model's response quality.

1. Manual Evaluation

This feature allows evaluators to manually assess the quality of AI responses based on predefined evaluation criteria (rubrics). It ensures objectivity through systematic and consistent evaluation standards while also reflecting human subjective judgment.

2. Interactive Evaluation

An interactive evaluation system where you can send queries directly to an AI model and evaluate the responses in real-time. It provides a feature for evaluators to immediately rate responses as Good/Bad and write a Ground Truth (GT).