Evidently Configuration Guide | Cognitiveview Help Center

In this guide, we explain how to configure and use Evidently to systematically evaluate language model outputs.
We define standardized test cases — including input, actual output, expected output, and retrieval context — and run a set of quality metrics such as answer relevancy, faithfulness, hallucination, and bias. These metrics are mapped to broader evaluation pillars like performance, fairness & bias, safety, and reliability, providing a structured way to quantify model quality.

After collecting these raw evaluation metrics, we submit them to the TRACE Metrics API.

TRACE processes these results to generate AI governance evidence, answering questions such as:

Does the AI system comply with NIST AI RMF, EU AI Act, or similar guidelines?
How safe, fair, and robust is the system in production?
Are there indicators of hallucination, bias, or inconsistent behavior?

This workflow supports teams and compliance stakeholders by:

Providing transparent, explainable evidence for responsible AI
Enabling dashboards and historical monitoring of AI performance and risk
Helping align AI systems with internal policies and external regulatory requirements

This combined approach ensures that evaluation is not just technical, but also supports governance, auditability, and long-term risk management.

Required Fields

Field Name	Description
metric_key	Standardized name (e.g.ContextRelevancy)
value	Raw metric value from Evidently (float)

Metric-to-Pillar Mapping

Metric Name	Canonical Space	Pillar	Better High
CorrectnessLLMEval	correctness_score	performance	Yes
FaithfulnessLLMEval	faithfulness_score	safety	Yes
ContextRelevance	context_relevance	performance	Yes
BLEU	bleu_score	performance	Yes
ROUGE	rouge_score	performance	Yes
BERTScore	bert_score	performance	Yes
Perplexity	perplexity	robustness	No
Diversity	diversity_score	robustness	Yes
DeclineLLMEval	decline_handling_score	task_adherence	Yes
PIILLMEval	pii_leakage_rate	privacy	No
NegativityLLMEval	negativity_score	safety	No
BiasLLMEval	bias_score	fairness	No
ToxicityLLMEval	toxicity_score	safety	No
ExactMatch	exact_match	performance	Yes
RegExp	regex_match	performance	Yes
Contains	substring_match	performance	Yes
IsValidJSON	json_validity	reliability	Yes
Sentiment	sentiment_score	fairness	—
TextLength	text_length	efficiency	—
OOVWordsPercentage	oov_rate	robustness	No
PrecisionTopK	precision_top_k	performance	Yes
RecallTopK	recall_top_k	performance	Yes
FBetaTopK	fbeta_top_k	performance	Yes
MAP	mean_avg_precision	performance	Yes
NDCG	ndcg	performance	Yes
MRR	mrr	performance	Yes
HitRate	hit_rate	performance	Yes
ScoreDistribution	score_distribution	transparency	—
Serendipity	serendipity_score	fairness	Yes
Diversity (RecSys)	diversity_score_recsys	robustness	Yes

Sample Evidently Code (Python)

#import necessary library
import pandas as pd
from evidently import Dataset
from evidently import DataDefinition
from evidently.descriptors import  DeclineLLMEval, Sentiment, TextLength, NegativityLLMEval,PIILLMEval, BiasLLMEval, ToxicityLLMEval, ContextQualityLLMEval, ContextRelevance
 
#preparing dataset
data = [
    [
        "What is the chemical symbol for gold?",
        "Gold chemical symbol is Au.",
        "Gold is a chemical element with the symbol Au and atomic number 79. It is a dense, soft, yellow metal highly valued for its rarity and conductivity."
    ],
    [
        "What is the capital of Japan?",
        "The capital of Japan is Tokyo.",
        "Tokyo is the capital city of Japan and one of the most populous metropolitan areas in the world."
    ],
    [
        "Tell me a joke.",
        "Why don't programmers like nature? Too many bugs!",
        "Programmers often use the term 'bug' to describe an error in code, which is humorously extended to nature, which has literal bugs."
    ]
 
]
columns = ["question", "answer", "context"]
eval_df = pd.DataFrame(data, columns=columns)
 
#Running Evaluation
 
eval_dataset = Dataset.from_pandas(
    eval_df,
    data_definition=DataDefinition(),
    descriptors=[
        NegativityLLMEval("answer",include_score=True),
        PIILLMEval("answer",include_score=True),
        DeclineLLMEval("answer", include_score=True),
        BiasLLMEval("answer", include_score=True),
        ToxicityLLMEval("answer" , include_score=True),
       ContextQualityLLMEval("context", question="question",include_score= True),
        ContextRelevance("answer", "context",alias="ContextRelevance"),
        Sentiment("answer", alias="Sentiment"),
        TextLength("answer", alias="Length"),
        DeclineLLMEval("answer", alias="Denials",include_score=True),])
 
#storing metric score
 
metric_results = {}
row = eval_dataset.as_dataframe().iloc[0]
for col, val in row.items():
    if isinstance(val, (int, float)):
        clean_col = col.replace(' score', '')
        metric_results[clean_col] = float(val)

Submit Results via API

Prepare Canonical Payload

{
  "metric_metadata": {
    "application_name": "chat-application",
    "version": "1.0.0",
    "provider": "evidently",
    "use_case": "transportation"
  },
  "metric_data": {
    "evidently": metric_results
  }

Send via Trace metric API:

import requests
 
BASE_URL = "https://api.cognitiveview.com"
AUTH_TOKEN ="Your-Authorization-Token-Here"  # Replace with your actual token
url = f"{BASE_URL}/metrics"
 
headers = {
    "Ocp-Apim-Subscription-Key": AUTH_TOKEN,
    "Content-Type": "application/json",
}
 
payload = {
  "metric_metadata": {
    "application_name": "chat-application",
    "version": "1.0.0",
    "provider": "evidently",
    "use_case": "transportation"
  },
  "metric_data": {
    "evidently": metric_results
  }
}
 
response = requests.post(url, headers=headers, json=payload)
 
# Output the response
print(f"Status Code: {response.status_code}")
print("Response JSON:", response.json())

How to get your TRACE Metrics API subscription key

To use the TRACE Metrics API, you must first obtain a Authorization from CognitiveView. Follow these steps:

Log in to CognitiveView
- Visit app.cognitiveview.com and sign in with your credentials.
Go to System Settings
- In the main menu, navigate to System Settings.
Find or generate your subscription key
- Look for the section labeled API Access or Authorization Key.
- If a key already exists, copy it.
- If not, click Generate Key to create a new one.
Copy and store the key securely
- You’ll need this key to authenticate API requests.
- Keep it safe and do not share it publicly.

Send via curl or FastAPI Client

curl -X POST https://api.cognitiveview.com/metrics \
  -H "Content-Type: application/json" \
  -d @eval_payload.json

Summary

Step	Action
1	Choose Evidently metrics relevant to your run_type
2	Run metrics and get raw score
3	Submit to /metrics or mcp://... endpoint

Additional resources

Explore example notebooks & sample code on our GitHub: see how to call the TRACE Metrics API step by step.

Questions? Reach out: support@cognitiveview.ai