How to Interpret and Export Evaluation Results
Step 4. Reviewing Evaluation Results
Once the evaluation is complete, you can view the results in the Table View.
Each sample (query–response pair) is displayed in a separate row,
and each metric (e.g., Precision, Recall, Faithfulness) is presented as a score between 0 and 1.
Clicking on a cell opens the Detail Panel on the right, where you can inspect claim-level evaluation results.
The Detail Panel visualizes the claim-level evaluation results for each selected sample (query–response pair).
Each section provides the following information:
-
Query Section
Displays the original user query (query) and relevant metadata.
This serves as the reference question for comparing the Expected Response (ER) and Target Response (TR).
Click “View reference context” to access the retrieved documents (contexts) corresponding to this query. -
Model Response Section
Shows the actual response (response) generated by the Target Model.
The Decomposition and Entailment processes are visualized at the claim level,
with each claim labeled by color and assigned a score.- 2-1. Claim Score Summary
Summarizes claim-level evaluation scores for the entire response. - 2-2. Claim-Level Judgments
Each claim is tagged asEntailed,Contradicted, orIrrelevant.
Additional labels such as “Context Entailed” or “Context Refuted” are displayed beside each claim. - 2-3. Full Target Model / Agent Response
Displays the complete text output generated by the model.
- 2-1. Claim Score Summary
-
Expected Response Section
Displays the claims decomposed from the Expected Response (ER).
Similar to the Query section, this includes the claim content, claim-level scores, and the full ER text. -
Retrieved Context Section
Shows the documents (Retrieved Context) referenced by the model during generation.
Each document includes entailment results indicating whether it supports any evaluated claims.
The top of this section summarizes the Context Precision score.- Example:
- C1: Contains one or more correct claims (Relevant Context)
- C2: Contains no correct claims (Irrelevant Context)
- Example:

Step 5. Exporting Results
To export the evaluation results, click the Export button at the top of the Table View.
The results will be downloaded as an .xlsx file,
which includes key metrics such as Precision, Recall, Faithfulness, and Hallucination for each sample.
🔍 Tip:
- The exported file can be used for further analysis or reporting,
such as comparing performance across different seeds or models.- Detailed claim-level results are available only within the Detail Panel in the UI.