Skip to main content

Benchmarks

Updated over 7 months ago

What is a Benchmark?

Benchmarks

Benchmarks is a feature designed to evaluate model performance and ensure policy alignment using a selected set of images. Benchmark images can be uploaded directly through the platform.

Types of Benchmarks

Location Benchmark

A Location Benchmark consists of images taken at a specific location. This allows you to assess how well the model performs with images captured in a particular setting. This is a random sample.

There can be one location benchmark for each asset type for one location.

Special Benchmark

A Special Benchmark comprises images that reflect the model's performance under specific conditions or traits. Examples include photos taken in certain weather conditions or images containing particular objects. This type of benchmark helps to understand the model's accuracy and reliability in varied scenarios.

There can be multiple location benchmark for each type of asset for one location.

Image: Benchmark examples


How to Access Benchmark

Both admins and members of a workspace can access benchmarks. Benchmark stats and overview can be found on homepage, while benchmark pages and other details are available under Benchmark tab

Accessing Benchmark Overview

  1. Go to the homepage.

  2. Click the “View” dropdown on top right corner.

  3. Select “Benchmarks” to navigate to all charts relevant to benchmarks.

Image: Benchmark overview on homepage

Viewing Detailed Benchmarks and Benchmark Images

  1. Navigate to the benchmark tab via the navigation bar at the top.

  2. Alternatively, click “View all benchmarks” on the model performance chart, located on the homepage.

Image: Access benchmark on homepage

Search Benchmark Images by Image URL

To search for benchmark images using an image URL:

  1. Click the filter icon button.

  2. In the slider that appears, locate the "Image URL" field.

  3. Paste the URL into the "Image URL" field.

Gif: Search via URL on benchmark images


Understand Benchmark

Benchmark Overview

The Benchmark Overview shows the overall performance of the latest model version on benchmark images. Model performance is represented by the percentage of correct predictions out of all predictions made.

Benchmark Overview is located at homepage and can be accessed through “View dropdown”. Learn more

Benchmarks Page

The Benchmarks Page offers a list of all benchmarks available in the current workspace, allowing you to easily browse and access them.

Image: Benchmarks page

Timestamp for Benchmark Result

The timestamp under the “Result at” column shows the last time the model made a prediction.

Benchmark Images

When clicking on a benchmark, you’ll be navigate to a page that displays all benchmark images.

Image: Benchmark images in list view

Ground Truth

The decision and the reason for this decision from a manual human review. These can be updated on the platform. See how

Acceptable Reasons

An image may be classified into different decisions during a human manual review. In such cases, you can add multiple acceptable reasons. If the prediction matches any of these reasons, it is considered correct. The ground truth will be the most valued acceptable reason.

Acceptable reason for each benchmark image can be edited. See how

Decision Match

  • Decision Match: If the predicted decision aligns with the ground truth, it counts as a decision match.

  • Decision Mismatch: If the predicted decision does not align with the ground truth, it counts as a decision mismatch.

Reason Match

  • Reason Match: If the predicted reason aligns with the ground truth or any other acceptable reasons, it counts as a reason match.

  • Reason Mismatch: If the predicted reason does not align with the ground truth or any other acceptable reasons, it counts as a reason mismatch.

Precision is calculated based on the percentage of reason matches against all benchmark images.

Grid View

Grid View provides a convenient way to quickly glance through all benchmark images. Click on the Grid View button above the filter bar to switch views.

Image: Grid view switch

Download Benchmark Results as a CSV

While viewing benchmark images, you can download the benchmark results by clicking the “Download CSV” button in the top right corner.

Image: Download CSV button

Confusion Matrix

A Confusion Matrix helps you understand how well the model is making predictions by comparing what the model predicted to what actually happened.

The Confusion Matrix is available for location benchmarks.

Image: Confusion Matrix for benchmarks

Accessing the Confusion Matrix

  1. Via Benchmark Tab:

    • Go to the benchmark tab to see a list of benchmarks for this workspace.

    • Find the location benchmark with the confusion matrix icon under the "Action" column.

    • Click the confusion matrix button to open the confusion matrix page.

  2. Via Location Benchmark Page:

    • Navigate to the location benchmark page.

    • Click the “Ground Truth Matrix” button.

Gif: Access confusion matrix

Understand Confusing matrix

Precision and Recall

Precision: This shows how many of the predicted cases were actually correct. It's the percentage of correct predictions out of all predictions made for a specific category.

Recall: This shows how many of the actual cases were correctly identified by the model. It's the percentage of relevant instances found out of all actual instances.

Image: Precision and recall for confusion matrix

Accuracy

Accuracy represents how often the model's predictions are correct compared to the actual results. It is calculated by dividing the number of correct predictions by the total number of benchmark images. This percentage helps you understand the model's reliability.

Image: Accuracy for confusion matrix

Risks

Fraud Risk: Fraud risk is the chance that someone might try to cheat or misuse the system.

In the Confusion Matrix, this occurs when the ground truth is non-compliant, but the model predicts it as compliant. This discrepancy indicates a potential fraud risk.(Orange cells)

User Friction Risk: User friction risk is the chance that users might find the system hard to use.

In the Confusion Matrix, this happens when the ground truth is compliant, but the model predicts it as non-compliant. This misalignment can lead to user frustration and dissatisfaction, indicating a user friction risk. (Pink cells)

Image: Risks view for confusion matrix

Acceptable Reasons Threshold

The Acceptable Reasons Threshold feature allows users to set a threshold for how many acceptable reasons will be considered when evaluating model performance using benchmarks.

By changing the selection in the "Acceptable Reasons Threshold" dropdown (e.g., top 2, top 3), users can indicate how many top reasons should be factored into performance assessments.

A higher threshold makes the evaluation less strict by considering more acceptable reasons.

Image: Acceptable Reasons Threshold

Inspect Images through the Confusion Matrix

You can easily inspect the images used in the confusion matrix calculation. Simply click on the number of images in each cell to view the filtered set of images corresponding to that category.

Gif: Inspect Images through the Confusion Matrix


Create Benchmark

When a workspace is created, a location benchmark is automatically generated for it.

Each location can and will have only one location benchmark, but multiple special benchmarks can be created for a single location.

Create Special Benchmarks

Special benchmarks can be created from the Control Center. There are two ways to access the page to add benchmarks:

  1. Via Navigation Bar:

    • Click on the Control Center in the navigation bar at the top.

    • Select “Add Benchmark” from the side menu to navigate to the Add Benchmark page.

  2. Via Benchmark Page:

    • On the Benchmark page, click on “Add Benchmark” to be navigated to the Add Benchmark page.

Gif: Navigation for adding benchmark

Add images to Special Benchmark

  1. Enter the benchmark name.

  2. Select the correct location and asset type.

  3. Upload the benchmark in a CSV format.

Gif: Add a special benchmark

Download Benchmark Template

On the Add Benchmark page, you can download a template for adding a new benchmark by clicking the “Download template CSV” button.

The template includes two essential columns: imageUrl and acceptableReasons.

You can add additional columns such as image reference, address, and image ID.

If you add additional columns, ensure the column headers are in camel casing.

Information in additional columns may appear as “other metadata” for the benchmark.

Image: Download benchmark template CSV from dashboard


Update Benchmark

Add Benchmark Image from Control Center

Benchmark images can also be added via CSV upload.

  • When viewing a benchmark, click the “Add Image” button in the top right corner.

  • You’ll be navigated to a page to add a benchmark.

  • Select the benchmark you want to update and upload your CSV file.

On the Add Benchmark page, you can download a template for adding a new benchmark by clicking the “Download template CSV” button. Learn more about CSV template.

For optimal performance, each benchmark should include a minimum of 30 images and a maximum of 1000 images for each scenario you want to cover.

Gif: Add benchmark images via control centre

Update Acceptable Reasons

Ground truth and acceptable reasons can be updated on the dashboard.

There are two ways to update acceptable reasons:

1. Via Edit Acceptable Reason Page

- Click the “Details” icon button to navigate to the page for updating acceptable reasons.

- Adjust the order of acceptable reasons by dragging them. Ensure the most valued acceptable reason is at the top of the list.

Gif: Update acceptable reasons and ground truth via edit page

2. Via Pop-up Component

- In the list view or grid view, click the "Edit" icon button to see a pop-up component.

- Use the pop-up to update the acceptable reasons directly.

- Once done, click outside of the pop-up to save the change.

These methods provide flexibility for keeping your benchmarks accurate and up-to-date.

Gif: Update acceptable reasons and ground truth via list pop-up

Gif: Update acceptable reasons and ground truth via grid pop-up

Edit Benchmark Name

Admin Only: Only admins can edit the benchmark name.

To edit the benchmark name:

  1. Click the “Edit Benchmark” button in the top right corner.

  2. Make your changes in the modal that appears.

  3. Click "Save" to apply the changes.

Image: Edit benchmark name

Delete Entire Benchmark

Important Note: Only special benchmarks can be deleted. Once deleted, benchmark images cannot be restored.

If a benchmark is no longer needed, you can delete it from the dashboard.

There are two ways to delete benchmarks:

Via Edit Benchmark Modal

  1. Click the “Edit Benchmark” button in the top right corner.

  2. Click the “Delete” icon button at the bottom of the modal.

  3. Confirm the deletion in the modal.

Image: Edit benchmark button can take you to pop-up modal

Image: Modal where there's a button to delete benchmark

Via Benchmarks Page

  1. On the Benchmarks page, click the delete button under the “Action” column.

  2. Confirm the deletion in the modal.

Image: Delete button on benchmark page

Image: Confirm the deletion in the modal.

Delete Benchmark Image

Important Note: Once deleted, benchmark images cannot be restored.

Outdated or redundant benchmark images can be deleted via the benchmark image page.

To delete a benchmark image:

  1. Select a benchmark.

  2. Click the “Delete” icon button in the list view or grid view.

  3. Confirm the deletion in the modal.

Gif: Delete benchmark images

Gif: Delete benchmark images with Grid view


FAQ

1. Is a Benchmark Image the Same as a Record Image?

No, benchmark images and record images are different.

  • Benchmark Images: A separate set of images selected specifically to test the model's performance.

  • Record Images: Typically images from end users, used in regular operations rather than for testing.

Record images can be added as benchmark images. See how

2. Is Snapshot the Same as Benchmark?

No, a snapshot is different from a benchmark.

  • Snapshot: This captures model performance based on records submitted from the SDK or real-time capture. It reflects the current state of the system using real user data.

  • Benchmark: This evaluates model performance and ensures policy alignment using a selected set of images. Benchmark stats are derived from benchmark images, which can only be uploaded via CSV and are randomly sampled.

Did this answer your question?