Model & Agent

Overview

In Datumo Eval, Model and Agent are core components of the evaluation pipeline. Understanding the roles of the Target Model being evaluated and the Judge Model performing evaluation is crucial.

Types of Models

1. Target Model (Model Being Evaluated)

① Role

The AI model that is the subject of evaluation.

② Functions

Generates responses to Queries in the Dataset.
Generated responses are evaluated by the Judge Model.
Supports various LLM providers (OpenAI, Anthropic, Google, etc.).
Custom API endpoint connection is available.

2. Judge Model (Evaluation Model)

① Role

The AI model that evaluates responses from the Target Model.

② Functions

Evaluates response quality according to defined Metric criteria.
Generates scores and evaluation reasoning.
High-performance models are recommended for consistent evaluation.

Agent Concept

1. What is an Agent

① Definition

An Agent is a model instance with specific roles and configurations.

② Components

Component	Description
Base Model	The underlying LLM (e.g., GPT-4, Claude)
System Prompt	Prompt defining the model's role and behavior
Temperature	Parameter controlling response creativity/consistency
Max Tokens	Maximum response length limit

2. Agent Usage Examples

① RAG Agent

Generates responses based on retrieved context.

② Safety Agent

Applies guidelines for safe response generation.

③ Domain Expert Agent

Generates responses specialized in specific domains.

Judge Model Selection Criteria

1. Recommendations

① Use High-Performance Models

Latest models like GPT-4, Claude 3, etc. are recommended for accurate evaluation.

② Ensure Consistency

Maintain evaluation consistency with low Temperature settings.

③ Sufficient Context Window

Select models that can evaluate long responses.

2. Considerations

① Potential Bias

Using the same model as both Target and Judge may introduce bias.

② Balance Cost and Performance

Consider the balance between cost and performance.

③ Purpose-Appropriate Selection

Select models appropriate for the evaluation purpose (e.g., multilingual support for multilingual evaluation).

Model Registration and Management

1. API Key Management

① API Key Registration

② Security

Provides encrypted storage for security.

Team-level Key sharing is available.

2. Custom Model Connection

① REST API Connection

Supports REST API endpoint connections.

② On-Premises Integration

On-premises model integration is available.

③ Response Format Mapping

Provides response format mapping configuration.

Role in Evaluation Flow

1. Evaluation Process

① Complete Flow

Query → Target Model → Response → Judge Model → Score & Reasoning

② Step-by-Step Description

Query Delivery: Query from Dataset is delivered to the Target Model.
Response Generation: Target Model generates a Response.
Evaluation Execution: Judge Model evaluates based on Metric criteria.
Result Production: Generates score and evaluation reasoning.

Evaluation Task - Model configuration in Task
Metrics - Evaluation criteria definition
Model Management Tutorial - API Key registration methods

Types of Models​

1. Target Model (Model Being Evaluated)​

① Role​

② Functions​

2. Judge Model (Evaluation Model)​

① Role​

② Functions​

Agent Concept​

1. What is an Agent​

① Definition​

② Components​

2. Agent Usage Examples​

① RAG Agent​

② Safety Agent​

③ Domain Expert Agent​

Judge Model Selection Criteria​

1. Recommendations​

① Use High-Performance Models​

② Ensure Consistency​

③ Sufficient Context Window​

2. Considerations​

① Potential Bias​

② Balance Cost and Performance​

③ Purpose-Appropriate Selection​

Model Registration and Management​

1. API Key Management​

① API Key Registration​

② Security​

③ Team Sharing​

2. Custom Model Connection​

① REST API Connection​

② On-Premises Integration​

③ Response Format Mapping​

Role in Evaluation Flow​

1. Evaluation Process​

① Complete Flow​

② Step-by-Step Description​

Related Documents​

Types of Models

1. Target Model (Model Being Evaluated)

① Role

② Functions

2. Judge Model (Evaluation Model)

① Role

② Functions

Agent Concept

1. What is an Agent

① Definition

② Components

2. Agent Usage Examples

① RAG Agent

② Safety Agent

③ Domain Expert Agent

Judge Model Selection Criteria

1. Recommendations

① Use High-Performance Models

② Ensure Consistency

③ Sufficient Context Window

2. Considerations

① Potential Bias

② Balance Cost and Performance

③ Purpose-Appropriate Selection

Model Registration and Management

1. API Key Management

① API Key Registration

② Security

③ Team Sharing

2. Custom Model Connection

① REST API Connection

② On-Premises Integration

③ Response Format Mapping

Role in Evaluation Flow

1. Evaluation Process

① Complete Flow

② Step-by-Step Description

Related Documents