Evaluation Pipeline
This document explains the structural relationships between core concepts that compose evaluations in Datumo Eval. Actual UI-based usage procedures are covered in Tutorials and Guides.
Evaluation Pipeline Overview
1. Overall Flow
Datumo Eval's evaluation process is a structured flow for measuring, comparing, and analyzing the quality of generative AI models with consistent criteria. This pipeline consists of four concepts: Dataset, Evaluation Task, Evaluation Set, and Evaluation Result. Each element is independent yet organically connected to compose the entire evaluation.
Datumo Eval supports various evaluation frameworks including qualitative evaluation (LLM Judge), quantitative evaluation (metric-based), rule-based evaluation, and automatic red-teaming. This document's pipeline is explained based on the most representative Core Evaluation structure, and some components may vary in specific frameworks.