Skip to main content

Basic Evaluation

The Basic Evaluation feature allows users to perform standard model evaluations.
It is structured around a project-based flow, from dataset setup to result analysis.

info

For project creation, refer to the Setting Up Your First Project page.

DATUMO Eval’s basic evaluation consists of 4 main steps after creating a project:

  1. Create evaluation dataset
  2. Review and save the dataset
  3. Run evaluation and select model(s)
  4. Analyze results (Dashboard & Table View)



πŸ“š Basic Evaluation Tutorial Overview​




πŸ“˜ Basic Evaluation Flow​

Step 1: Create Evaluation Dataset​

Before running an evaluation, you need to prepare a dataset.
You can upload documents (contexts) or directly provide queries and responses to create your dataset.

You can choose from the following 3 Upload Types:

Upload TypeStructureDescription
Query GenerationContext-onlyUpload documents only, and the system generates questions. You can set parameters and review sample queries before generating the full dataset.
Query UploadContext + QueryUpload documents and user-written queries for direct use in evaluation.
Response UploadQuery + ResponseUpload both queries and model responses to run evaluations without collecting new responses.

Available target model options and evaluation flows vary depending on the upload type.
πŸ‘‰ See Setting Up Dataset for details.



Step 2: Review and Save Dataset​

Review the structure of your dataset (queries/responses) and save it.
You can also edit or supplement the dataset before evaluation if needed.
πŸ‘‰ For editing features, refer to Dataset & Review Editing.



Step 3: Run Evaluation​

After saving your dataset, select your target model(s) and start the evaluation.
Model selection rules vary depending on the upload type:

  • Context-only / Context + Query
    β†’ Multiple models can be selected.

  • Query + Response
    β†’ Only models with pre-generated responses can be selected.
    (⚠️ Models added via "Add Anyway" will not generate new responses.)

πŸ‘‰ For full instructions, visit Running Evaluation.



Step 4: View Evaluation Results​

Once evaluation is complete, results will be shown on the dashboard:

  • Score distributions per metric
  • Individual response scores and reasons
  • Detailed views via the Table View

πŸ‘‰ Learn more at Viewing Results.



Other Evaluation Types​

  • RAG Checker PRO
    β†’ Evaluate factual accuracy and document-grounded responses.

  • Red-teaming Add-on
    β†’ Assess model vulnerabilities and safety risks.