Overview | Datumo Eval Docs

Skip to main content

Evaluation Task

📊 Basic Evaluation Flow

The Evaluation Task is the most basic evaluation workflow in Datumo Eval. It utilizes a Judge evaluation model to compare and evaluate the responses of a Target model, and can quantify model performance based on a Dataset.

The entire flow is as follows:

Create an Evaluation Task
Create and Run an Eval Set
Check Evaluation Results
(Advanced) Manage Task evaluation, edit evaluation results, and check the BEIR Leaderboard view.

📄️ 1. Create an Eval.Task

Create a new evaluation Task.

📄️ 2. Run Eval Set

Create an Eval Set, set the conditions, and run the evaluation.

📄️ 3. Check Results

Check the evaluation results with the Dashboard and Table View.

📄️ + Beir Leaderboard

Perform BEIR benchmark evaluation along with Judge evaluation and check the results on the leaderboard.

📄️ + Eval Task Management

Manage by Task unit. You can pause/restart the evaluation, or edit the name and description while the llm evaluation is in progress.

📄️ + Edit results manually

You can edit the evaluation results.

📄️ + Batch Scheduling

Enables automatic scheduling of Judgement Evaluation runs.

Evaluation Task