---
Source: https://docs.microblink.com/verify/evaluation
Title: Evaluate Verify
Description: Evaluate and test BlinkID Verify performance before production deployment
---

# Evaluate Verify

Before going to production, you may want to evaluate Verify on your own data.
The difficulty of this evaluation depends on the type of data that you have.

## What to expect

On a global evaluation dataset of hundreds of thousands of real-world images of real and fake documents, Verify achieves highly reliable results with the default configuration.

When controlling for image quality, the false rejection rate ([FRR](./glossary.md#frr)) is low both globally and in the USA.

The evaluation also includes liveness false rejections, but those datasets don't have screen/photocopy presentation attacks.
On dedicated liveness datasets, screen/photocopy (and aggregate liveness) false acceptance rate and false rejection rate remain low.[^1]

## False acceptance and false rejection

If you have a sample set of real document images, you can measure Verify's false rejection rate.

But if you have only images of real documents, you can't measure the false rejection rate.
Inversely, if you only have images of fake documents, you can't measure the false acceptance rate.

The balance between FRR and FAR is important because any system can achieve a perfectly low false rejection score by accepting everything (including fraudulent documents), or a perfectly low false acceptance score by rejecting everything (including real documents).

:::tip[Expired documents]
Be careful when testing with expired documents!
By default, Verify will treat expired documents as fraud attempts.
However, if your dataset contains expired documents you want to treat as genuine, you must set the [`TreatExpirationAsFraud`](./configuration.md#expiration) option to `false`.
:::

In theory, you can simply disable all checks and get 0% FRR.
Without fake documents, you won't know you're not catching any fraud.
Similarly, you might be comparing two products, and if one has 1.5% FRR, while the other one has 1.75% FRR, you might conclude the first one is better.
But what if it catches 10 times less fraud then the second one?

We recommend acquiring a quality dataset of synthetic fake documents, like the Department of Homeland Security's [IDNet dataset](https://www.researchgate.net/publication/382884673_IDNet_A_Novel_Dataset_for_Identity_Document_Analysis_and_Fraud_Detection).

## Recommended sample size for evaluation

To ensure that the evaluation results accurately reflect BlinkID Verify's performance, it is important to use a large enough and diverse sample set.
Evaluations based on a very small number of images (for example, 5 examples per document type) can lead to misleading conclusions due to lack of representative data.
We recommend the following sample sizes for reliable testing:

Minimum setup:

- 20 to 30 real document examples per document type
- 20 to 30 fake document examples per document type

Recommended setup:

- around 100 real and 100 fake examples per document type

This way, your evaluation results will more accurately reflect real-world performance.

## Trade-offs

FRR and FAR exist in tension.

If you want to increase the number of accepted real documents, the trade-off is that you will also increase the number of accepted fake documents.

If you want to increase the number of rejected fake documents, the trade-off is that you will also increase the number of rejected real documents.

It's important to identify where you want to be on this scale, and [tune the solution](./configuration.md) to get there.
This is typically done by fixing one of the two metrics.

For example, you might know you don't want to reject more than 0.5% of your real users, so you're targeting for the best possible rate of fraud detection without going over 0.5% FRR (all else being equal).

Here's an example visualization of this relationship using a hypothetical dataset:


<!-- interactive component omitted -->


The performance of Verify can be configured based on your FAR/FRR requirements.

Read about it [here](./configuration.md).

## Evaluating the recommended outcome

The `RecommendedOutcome` can return five values for each document: `Accept`, `Reject`, `ManualReview`, `Undeterminable`, and `Retry`.
You don't need to filter your dataset before evaluating—every document falls into one of these buckets.

FRR is the share of real documents that received a `Reject` outcome.
FAR is the share of fake documents that received an `Accept` outcome.

Documents that came back `Undeterminable` or `Retry` were not conclusively classified, so track these separately as your unprocessed rate.

If you have [`ManualReview`](./configuration.md#use-case-parameters) enabled, also measure what share of real and fake documents were flagged for human review.


<!-- interactive component omitted -->


### OverallFraudCheck evaluation

For strictly binary `Pass`/`Fail` outputs, filter your dataset first: exclude unsupported documents and low-quality images that the [client-side SDK](./sdk.md) rejects before they reach the API in production.

With that filtered set, use [True Acceptance Rate](./glossary.md#tar) and [True Rejection Rate](./glossary.md#trr) instead of FAR and FRR.
TAR is the share of real documents that received a `Pass` verdict.
TRR is the share of fake documents that received a `Fail` verdict.


<!-- interactive component omitted -->


---

[^1]: See how Verify performs on the [DHS IDNet dataset](https://microblink.com/about/newsroom/microblink-identified-as-only-vendor-to-meet-all-performance-thresholds-in-u-s-department-of-homeland-security-identity-verification-evaluation/).


Last updated on Jun 18, 2026