Confidence Score Explained

What is a confidence score?

Confidence score refers to the level of certainty or reliability associated with an extracted value. Scores provide an indication of how accurate the extracted information is likely to be. By leveraging confidence details provided by Veryfi, you can assess the data extraction prediction and make informed decisions on how to handle it.

📍It is important to note that confidence details are not absolute measures of accuracy but serve as indicators or probabilities of reliability for relevant fields.

Confidence details are supported on Veryfi OCR APIs

Receipts / Invoices OCR API

API Docs https://api.veryfi.com/api/v8/partner/documents

W-9 Forms OCR API
API Docs https://api.veryfi.com/api/v8/partner/w9s

Bank Statements OCR API

API Docs https://api.veryfi.com/api/v8/partner/bank-statements

Bank Checks OCR API

API Docs https://api.veryfi.com/api/v8/partner/checks

What confidence details does Veryfi return?

confidence_details is a request parameter. By default, it is set to False; If you set it to True, the API response will return additional lines for extracted values: "ocr_score", "score"

"ocr_score" - a confidence OCR score, is a measure of how confident the Veryfi OCR system is in the correctness of the recognized text. Each character recognized by the Veryfi OCR engine is assigned a confidence score, indicating the system's overall level of certainty regarding the accuracy of the recognition.
"score" - a confidence score, represents the confidence of mapping an extracted value to a particular JSON field.

📍JSON response structure changes if you enable confidence details. If your current implementation does not support confidence details, you may need to adjust it to use them in production. Please refer to API Docs for more details.

How to interpret the score

Let's take a look at total field in the Receipts/Invoices API

"total": {
"ocr_score": 1.0,
"score": 0.94,
"value": 371.56
 },

ocr_score - The probability that the value 371.56 is recognized correctly from the image/document, and not 377.56 or 871.56, for example

score - The probability that the value 371.56 corresponds to the total field, and not subtotal or tax, for example

value - 371.56

Or date field in Bank Checks API

 "date": {
 "ocr_score": 0.99,
 "score": 0.97,
 "value": "2024-12-18"
 }

The score ranges from 0 to 1, with higher scores indicating greater confidence in the accuracy of the recognized character.

ocr_score: 1.0 -> 1 = 100%

score: 0.74 -> 0.74 = 74%

💡 Pro Tip: Use scores to build Business Validation Logic for data handling. You can use either Veryfi Business Rules or Any Rules Engine you have in-house. The minimum recommended threshold is 0.7, but don't rely on it blindly, different use cases can have different thresholds depending on data quality and the endpoint you use. We recommend making informed decisions backed up by data.

Things to note

When working with Veryfi APIs, you may observe different patterns in how confidence scores are returned. The system provides different types of confidence scores depending on how the data was obtained. It is important to understand these variations and take them into account during implementation.

1. Fields Directly Extracted from Document / Image

When information is explicitly visible in the document itself, the system returns both OCR and mapping confidence scores.

2. Fields Inferred from Context & Post-Processed Values

For fields derived through inference, rather than direct extraction, only the "score" value appears. This occurs with fields like categories, currency codes, and document types that are determined through contextual analysis and a smart post-processing logic.

3. Unsupported or Empty Fields

No confidence scores appear in two scenarios:

When a field type doesn't support confidence scoring:
```
"barcodes": []
```
When a supported field isn't found in the document:
```
"due_date": ""
```

💡 Pro tip: Get confidence details for already processed documents

3. line_items and tax_lines

line_items and tax_lines are always returned without scores; for ease of use, all scores from these objects are returned in a separate object, which is only returned when confidence_details are on: line_items_with_scores for line items, and tax_lines_with_scores for tax lines.

Have any questions? Please contact us at support@veryfi.com.

Custom Text Field Extraction using Regex

Blur detection and image quality

Business Card Data Extraction

What affects data extraction accuracy

Understanding the "tip" Field