Evaluation Pipeline

Overview

This document explains the structural relationships between core concepts that compose evaluations in Datumo Eval. Actual UI-based usage procedures are covered in Tutorials and Guides.

Evaluation Pipeline Overview

1. Overall Flow

Datumo Eval's evaluation process is a structured flow for measuring, comparing, and analyzing the quality of generative AI models with consistent criteria. This pipeline consists of four concepts: Dataset, Evaluation Task, Evaluation Set, and Evaluation Result. Each element is independent yet organically connected to compose the entire evaluation.

Datumo Eval supports various evaluation frameworks including qualitative evaluation (LLM Judge), quantitative evaluation (metric-based), rule-based evaluation, and automatic red-teaming. This document's pipeline is explained based on the most representative Core Evaluation structure, and some components may vary in specific frameworks.

Evaluation Pipeline Overview​

1. Overall Flow​

Evaluation Pipeline Overview

1. Overall Flow