Skip to main content

Evidently Configuration Guide

Here’s a step-by-step guide to help you configure Evidently to generate and submit evaluation metrics in Trace metric API

Updated yesterday

In this guide, we explain how to configure and use Evidently to systematically evaluate language model outputs.
We define standardized test cases — including input, actual output, expected output, and retrieval context — and run a set of quality metrics such as answer relevancy, faithfulness, hallucination, and bias. These metrics are mapped to broader evaluation pillars like performance, fairness & bias, safety, and reliability, providing a structured way to quantify model quality.

After collecting these raw evaluation metrics, we submit them to the TRACE Metrics API.

TRACE processes these results to generate AI governance evidence, answering questions such as:

  • Does the AI system comply with NIST AI RMF, EU AI Act, or similar guidelines?

  • How safe, fair, and robust is the system in production?

  • Are there indicators of hallucination, bias, or inconsistent behavior?

This workflow supports teams and compliance stakeholders by:

  • Providing transparent, explainable evidence for responsible AI

  • Enabling dashboards and historical monitoring of AI performance and risk

  • Helping align AI systems with internal policies and external regulatory requirements

This combined approach ensures that evaluation is not just technical, but also supports governance, auditability, and long-term risk management.

Required Fields

Field Name

Description

metric_key

Standardized name (e.g.ContextRelevancy)

value

Raw metric value from Evidently (float)

Metric-to-Pillar Mapping

Metric Name

Canonical Space

Pillar

Better High

CorrectnessLLMEval

correctness_score

performance

Yes

FaithfulnessLLMEval

faithfulness_score

safety

Yes

ContextRelevance

context_relevance

performance

Yes

BLEU

bleu_score

performance

Yes

ROUGE

rouge_score

performance

Yes

BERTScore

bert_score

performance

Yes

Perplexity

perplexity

robustness

No

Diversity

diversity_score

robustness

Yes

DeclineLLMEval

decline_handling_score

task_adherence

Yes

PIILLMEval

pii_leakage_rate

privacy

No

NegativityLLMEval

negativity_score

safety

No

BiasLLMEval

bias_score

fairness

No

ToxicityLLMEval

toxicity_score

safety

No

ExactMatch

exact_match

performance

Yes

RegExp

regex_match

performance

Yes

Contains

substring_match

performance

Yes

IsValidJSON

json_validity

reliability

Yes

Sentiment

sentiment_score

fairness

TextLength

text_length

efficiency

OOVWordsPercentage

oov_rate

robustness

No

PrecisionTopK

precision_top_k

performance

Yes

RecallTopK

recall_top_k

performance

Yes

FBetaTopK

fbeta_top_k

performance

Yes

MAP

mean_avg_precision

performance

Yes

NDCG

ndcg

performance

Yes

MRR

mrr

performance

Yes

HitRate

hit_rate

performance

Yes

ScoreDistribution

score_distribution

transparency

Serendipity

serendipity_score

fairness

Yes

Diversity (RecSys)

diversity_score_recsys

robustness

Yes

Sample Evidently Code (Python)

#import necessary library
import pandas as pd
from evidently import Dataset
from evidently import DataDefinition
from evidently.descriptors import DeclineLLMEval, Sentiment, TextLength, NegativityLLMEval,PIILLMEval, BiasLLMEval, ToxicityLLMEval, ContextQualityLLMEval, ContextRelevance

#preparing dataset
data = [
[
"What is the chemical symbol for gold?",
"Gold chemical symbol is Au.",
"Gold is a chemical element with the symbol Au and atomic number 79. It is a dense, soft, yellow metal highly valued for its rarity and conductivity."
],
[
"What is the capital of Japan?",
"The capital of Japan is Tokyo.",
"Tokyo is the capital city of Japan and one of the most populous metropolitan areas in the world."
],
[
"Tell me a joke.",
"Why don't programmers like nature? Too many bugs!",
"Programmers often use the term 'bug' to describe an error in code, which is humorously extended to nature, which has literal bugs."
]

]
columns = ["question", "answer", "context"]
eval_df = pd.DataFrame(data, columns=columns)

#Running Evaluation

eval_dataset = Dataset.from_pandas(
eval_df,
data_definition=DataDefinition(),
descriptors=[
NegativityLLMEval("answer",include_score=True),
PIILLMEval("answer",include_score=True),
DeclineLLMEval("answer", include_score=True),
BiasLLMEval("answer", include_score=True),
ToxicityLLMEval("answer" , include_score=True),
ContextQualityLLMEval("context", question="question",include_score= True),
ContextRelevance("answer", "context",alias="ContextRelevance"),
Sentiment("answer", alias="Sentiment"),
TextLength("answer", alias="Length"),
DeclineLLMEval("answer", alias="Denials",include_score=True),])

#storing metric score

metric_results = {}
row = eval_dataset.as_dataframe().iloc[0]
for col, val in row.items():
if isinstance(val, (int, float)):
clean_col = col.replace(' score', '')
metric_results[clean_col] = float(val)

Submit Results via API

Prepare Canonical Payload

{
"metric_metadata": {
"application_name": "chat-application",
"version": "1.0.0",
"provider": "evidently",
"use_case": "transportation"
},
"metric_data": {
"evidently": metric_results
}

Send via Trace metric API:

import requests

BASE_URL = "https://api.cognitiveview.com"
AUTH_TOKEN ="Your-Authorization-Token-Here" # Replace with your actual token
url = f"{BASE_URL}/metrics"

headers = {
"Ocp-Apim-Subscription-Key": AUTH_TOKEN,
"Content-Type": "application/json",
}

payload = {
"metric_metadata": {
"application_name": "chat-application",
"version": "1.0.0",
"provider": "evidently",
"use_case": "transportation"
},
"metric_data": {
"evidently": metric_results
}
}

response = requests.post(url, headers=headers, json=payload)

# Output the response
print(f"Status Code: {response.status_code}")
print("Response JSON:", response.json())

How to get your TRACE Metrics API subscription key

To use the TRACE Metrics API, you must first obtain a Authorization from CognitiveView. Follow these steps:

  1. Log in to CognitiveView

  2. Go to System Settings

    • In the main menu, navigate to System Settings.

  3. Find or generate your subscription key

    • Look for the section labeled API Access or Authorization Key.

    • If a key already exists, copy it.

    • If not, click Generate Key to create a new one.

  4. Copy and store the key securely

    • You’ll need this key to authenticate API requests.

    • Keep it safe and do not share it publicly.

Send via curl or FastAPI Client

curl -X POST https://api.cognitiveview.com/metrics \
-H "Content-Type: application/json" \
-d @eval_payload.json

Summary

Step

Action

1

Choose Evidently metrics relevant to your run_type

2

Run metrics and get raw score

3

Submit to /metrics or mcp://... endpoint

Additional resources

  • Explore example notebooks & sample code on our GitHub: see how to call the TRACE Metrics API step by step.

Questions? Reach out: support@cognitiveview.ai

Did this answer your question?