Basic Evaluation

The Basic Evaluation feature allows users to perform standard model evaluations.
It is structured around a project-based flow, from dataset setup to result analysis.

info

For project creation, refer to the Setting Up Your First Project page.

DATUMO Eval’s basic evaluation consists of 4 main steps after creating a project:

Create evaluation dataset
Review and save the dataset
Run evaluation and select model(s)
Analyze results (Dashboard & Table View)

📚 Basic Evaluation Tutorial Overview

📘 Basic Evaluation Flow

Step 1: Create Evaluation Dataset

Before running an evaluation, you need to prepare a dataset.
You can upload documents (contexts) or directly provide queries and responses to create your dataset.

You can choose from the following 3 Upload Types:

Upload Type	Structure	Description
Query Generation	Context-only	Upload documents only, and the system generates questions. You can set parameters and review sample queries before generating the full dataset.
Query Upload	Context + Query	Upload documents and user-written queries for direct use in evaluation.
Response Upload	Query + Response	Upload both queries and model responses to run evaluations without collecting new responses.

Available target model options and evaluation flows vary depending on the upload type.
👉 See Setting Up Dataset for details.

Step 2: Review and Save Dataset

Review the structure of your dataset (queries/responses) and save it.
You can also edit or supplement the dataset before evaluation if needed.
👉 For editing features, refer to Dataset & Review Editing.

Step 3: Run Evaluation

After saving your dataset, select your target model(s) and start the evaluation.
Model selection rules vary depending on the upload type:

Context-only / Context + Query
→ Multiple models can be selected.
Query + Response
→ Only models with pre-generated responses can be selected.
(⚠️ Models added via "Add Anyway" will not generate new responses.)

👉 For full instructions, visit Running Evaluation.

Step 4: View Evaluation Results

Once evaluation is complete, results will be shown on the dashboard:

Score distributions per metric
Individual response scores and reasons
Detailed views via the Table View

👉 Learn more at Viewing Results.

Other Evaluation Types

RAG Checker ^PRO
→ Evaluate factual accuracy and document-grounded responses.
Red-teaming ^Add-on
→ Assess model vulnerabilities and safety risks.

📚 Basic Evaluation Tutorial Overview​

Setting Up Dataset

Dataset & Review Editing

Running Evaluation

Viewing Results

Edit Results

Project Management

📘 Basic Evaluation Flow​

Step 1: Create Evaluation Dataset​

Step 2: Review and Save Dataset​

Step 3: Run Evaluation​

Step 4: View Evaluation Results​

Other Evaluation Types​