What is a Benchmark?
Benchmarks
Benchmarks is a feature designed to evaluate model performance and ensure policy alignment using a selected set of images. Benchmark images can be uploaded directly through the platform.
Types of Benchmarks
Location Benchmark
A Location Benchmark consists of images taken at a specific location. This allows you to assess how well the model performs with images captured in a particular setting. This is a random sample.
There can be one location benchmark for each asset type for one location.
Special Benchmark
A Special Benchmark comprises images that reflect the model's performance under specific conditions or traits. Examples include photos taken in certain weather conditions or images containing particular objects. This type of benchmark helps to understand the model's accuracy and reliability in varied scenarios.
There can be multiple location benchmark for each type of asset for one location.
Image: Benchmark examples
How to Access Benchmark
Both admins and members of a workspace can access benchmarks. Benchmark stats and overview can be found on homepage, while benchmark pages and other details are available under Benchmark tab
Accessing Benchmark Overview
Go to the homepage.
Click the “View” dropdown on top right corner.
Select “Benchmarks” to navigate to all charts relevant to benchmarks.
Image: Benchmark overview on homepage
Viewing Detailed Benchmarks and Benchmark Images
Navigate to the benchmark tab via the navigation bar at the top.
Alternatively, click “View all benchmarks” on the model performance chart, located on the homepage.
Image: Access benchmark on homepage
Search Benchmark Images by Image URL
To search for benchmark images using an image URL:
Click the filter icon button.
In the slider that appears, locate the "Image URL" field.
Paste the URL into the "Image URL" field.
Gif: Search via URL on benchmark images
Understand Benchmark
Benchmark Overview
The Benchmark Overview shows the overall performance of the latest model version on benchmark images. Model performance is represented by the percentage of correct predictions out of all predictions made.
Benchmark Overview is located at homepage and can be accessed through “View dropdown”. Learn more
Benchmarks Page
The Benchmarks Page offers a list of all benchmarks available in the current workspace, allowing you to easily browse and access them.
Image: Benchmarks page
Timestamp for Benchmark Result
The timestamp under the “Result at” column shows the last time the model made a prediction.
Benchmark Images
When clicking on a benchmark, you’ll be navigate to a page that displays all benchmark images.
Image: Benchmark images in list view
Ground Truth
The decision and the reason for this decision from a manual human review. These can be updated on the platform. See how
Acceptable Reasons
An image may be classified into different decisions during a human manual review. In such cases, you can add multiple acceptable reasons. If the prediction matches any of these reasons, it is considered correct. The ground truth will be the most valued acceptable reason.
Acceptable reason for each benchmark image can be edited. See how
Decision Match
Decision Match: If the predicted decision aligns with the ground truth, it counts as a decision match.
Decision Mismatch: If the predicted decision does not align with the ground truth, it counts as a decision mismatch.
Reason Match
Reason Match: If the predicted reason aligns with the ground truth or any other acceptable reasons, it counts as a reason match.
Reason Mismatch: If the predicted reason does not align with the ground truth or any other acceptable reasons, it counts as a reason mismatch.
Precision is calculated based on the percentage of reason matches against all benchmark images.
Grid View
Grid View provides a convenient way to quickly glance through all benchmark images. Click on the Grid View button above the filter bar to switch views.
Image: Grid view switch
Download Benchmark Results as a CSV
While viewing benchmark images, you can download the benchmark results by clicking the “Download CSV” button in the top right corner.
Image: Download CSV button
Confusion Matrix
A Confusion Matrix helps you understand how well the model is making predictions by comparing what the model predicted to what actually happened.
The Confusion Matrix is available for location benchmarks.
Image: Confusion Matrix for benchmarks
Accessing the Confusion Matrix
Via Benchmark Tab:
Go to the benchmark tab to see a list of benchmarks for this workspace.
Find the location benchmark with the confusion matrix icon under the "Action" column.
Click the confusion matrix button to open the confusion matrix page.
Via Location Benchmark Page:
Navigate to the location benchmark page.
Click the “Ground Truth Matrix” button.
Gif: Access confusion matrix
Understand Confusing matrix
Precision and Recall
Precision: This shows how many of the predicted cases were actually correct. It's the percentage of correct predictions out of all predictions made for a specific category.
Recall: This shows how many of the actual cases were correctly identified by the model. It's the percentage of relevant instances found out of all actual instances.
Image: Precision and recall for confusion matrix
Accuracy
Accuracy represents how often the model's predictions are correct compared to the actual results. It is calculated by dividing the number of correct predictions by the total number of benchmark images. This percentage helps you understand the model's reliability.
Image: Accuracy for confusion matrix
Risks
Fraud Risk: Fraud risk is the chance that someone might try to cheat or misuse the system.
In the Confusion Matrix, this occurs when the ground truth is non-compliant, but the model predicts it as compliant. This discrepancy indicates a potential fraud risk.(Orange cells)
User Friction Risk: User friction risk is the chance that users might find the system hard to use.
In the Confusion Matrix, this happens when the ground truth is compliant, but the model predicts it as non-compliant. This misalignment can lead to user frustration and dissatisfaction, indicating a user friction risk. (Pink cells)
Image: Risks view for confusion matrix
Acceptable Reasons Threshold
The Acceptable Reasons Threshold feature allows users to set a threshold for how many acceptable reasons will be considered when evaluating model performance using benchmarks.
By changing the selection in the "Acceptable Reasons Threshold" dropdown (e.g., top 2, top 3), users can indicate how many top reasons should be factored into performance assessments.
A higher threshold makes the evaluation less strict by considering more acceptable reasons.
Image: Acceptable Reasons Threshold
Inspect Images through the Confusion Matrix
You can easily inspect the images used in the confusion matrix calculation. Simply click on the number of images in each cell to view the filtered set of images corresponding to that category.
Gif: Inspect Images through the Confusion Matrix
Create Benchmark
When a workspace is created, a location benchmark is automatically generated for it.
Each location can and will have only one location benchmark, but multiple special benchmarks can be created for a single location.
Create Special Benchmarks
Special benchmarks can be created from the Control Center. There are two ways to access the page to add benchmarks:
Via Navigation Bar:
Click on the Control Center in the navigation bar at the top.
Select “Add Benchmark” from the side menu to navigate to the Add Benchmark page.
Via Benchmark Page:
On the Benchmark page, click on “Add Benchmark” to be navigated to the Add Benchmark page.
Gif: Navigation for adding benchmark
Add images to Special Benchmark
Enter the benchmark name.
Select the correct location and asset type.
Upload the benchmark in a CSV format.
Gif: Add a special benchmark
Download Benchmark Template
On the Add Benchmark page, you can download a template for adding a new benchmark by clicking the “Download template CSV” button.
The template includes two essential columns: imageUrl and acceptableReasons.
You can add additional columns such as image reference, address, and image ID.
If you add additional columns, ensure the column headers are in camel casing.
Information in additional columns may appear as “other metadata” for the benchmark.
Image: Download benchmark template CSV from dashboard
Update Benchmark
Add Benchmark Image from Control Center
Benchmark images can also be added via CSV upload.
When viewing a benchmark, click the “Add Image” button in the top right corner.
You’ll be navigated to a page to add a benchmark.
Select the benchmark you want to update and upload your CSV file.
On the Add Benchmark page, you can download a template for adding a new benchmark by clicking the “Download template CSV” button. Learn more about CSV template.
For optimal performance, each benchmark should include a minimum of 30 images and a maximum of 1000 images for each scenario you want to cover.
Gif: Add benchmark images via control centre
Update Acceptable Reasons
Ground truth and acceptable reasons can be updated on the dashboard.
There are two ways to update acceptable reasons:
1. Via Edit Acceptable Reason Page
- Click the “Details” icon button to navigate to the page for updating acceptable reasons.
- Adjust the order of acceptable reasons by dragging them. Ensure the most valued acceptable reason is at the top of the list.
Gif: Update acceptable reasons and ground truth via edit page
2. Via Pop-up Component
- In the list view or grid view, click the "Edit" icon button to see a pop-up component.
- Use the pop-up to update the acceptable reasons directly.
- Once done, click outside of the pop-up to save the change.
These methods provide flexibility for keeping your benchmarks accurate and up-to-date.
Gif: Update acceptable reasons and ground truth via list pop-up
Gif: Update acceptable reasons and ground truth via grid pop-up
Edit Benchmark Name
Admin Only: Only admins can edit the benchmark name.
To edit the benchmark name:
Click the “Edit Benchmark” button in the top right corner.
Make your changes in the modal that appears.
Click "Save" to apply the changes.
Image: Edit benchmark name
Delete Entire Benchmark
Important Note: Only special benchmarks can be deleted. Once deleted, benchmark images cannot be restored.
If a benchmark is no longer needed, you can delete it from the dashboard.
There are two ways to delete benchmarks:
Via Edit Benchmark Modal
Click the “Edit Benchmark” button in the top right corner.
Click the “Delete” icon button at the bottom of the modal.
Confirm the deletion in the modal.
Image: Edit benchmark button can take you to pop-up modal
Image: Modal where there's a button to delete benchmark
Via Benchmarks Page
On the Benchmarks page, click the delete button under the “Action” column.
Confirm the deletion in the modal.
Image: Delete button on benchmark page
Image: Confirm the deletion in the modal.
Delete Benchmark Image
Important Note: Once deleted, benchmark images cannot be restored.
Outdated or redundant benchmark images can be deleted via the benchmark image page.
To delete a benchmark image:
Select a benchmark.
Click the “Delete” icon button in the list view or grid view.
Confirm the deletion in the modal.
Gif: Delete benchmark images
Gif: Delete benchmark images with Grid view
FAQ
1. Is a Benchmark Image the Same as a Record Image?
No, benchmark images and record images are different.
Benchmark Images: A separate set of images selected specifically to test the model's performance.
Record Images: Typically images from end users, used in regular operations rather than for testing.
Record images can be added as benchmark images. See how
2. Is Snapshot the Same as Benchmark?
No, a snapshot is different from a benchmark.
Snapshot: This captures model performance based on records submitted from the SDK or real-time capture. It reflects the current state of the system using real user data.
Benchmark: This evaluates model performance and ensures policy alignment using a selected set of images. Benchmark stats are derived from benchmark images, which can only be uploaded via CSV and are randomly sampled.