Skip to main content

Fuzzy Logic

Fuzziness enables you to return matches when there are small variations in the spelling.

Written by Yuri Beckers
Updated over a month ago

Fuzziness is a matching technique that allows us to return results even when there are minor spelling differences between a customer’s name and the names found in risk lists or media sources.

It is also known as edit distance or Levenshtein distance. Our matching algorithm applies fuzziness alongside a range of other techniques when comparing names.


Fuzziness identifies matches where a name contains an inserted, omitted, or substituted character compared to the name being screened. It allows for up to one character difference per word in the search term.

Fuzziness is particularly useful in two scenarios:

  1. When a customer’s name is manually entered and may include typographical or spelling errors.

  2. When a customer’s name has been transliterated into Latin script from a non-Latin script, or is written in a different script than the name appearing in a risk list or media article.

By default, our matching algorithm applies fuzziness only to the second scenario—matching transliteration variants of names. As a result, fuzzy matches are returned by default only when the searched and matched name words are broadly phonetically equivalent, based on an industry-standard phonetic algorithm.

Configuration options are available to modify this default behavior, allowing fuzzy matches to also be generated for typographical and spelling errors.

Understanding Fuzziness Percentages

Fuzziness always permits up to one character difference per word in the search term. The fuzziness percentage determines how long a word must be before such a difference is considered meaningful.

This distinction is important because a single character difference is more likely to indicate the same entity in longer names than in shorter ones. For example, Leederheimer and Lexderheimer are much more likely to be misspellings of the same name than Lee and Lex.

The appropriate fuzziness percentage depends on your risk-based approach and your confidence in the accuracy of the names being searched. For instance, names taken directly from customers’ identity documents are typically more reliable than names entered manually by customers, which are more susceptible to errors.

Fuzziness Setting

Minimum word length to allow fuzziness

0%

None (no fuzziness allowed)

10%

25

20%

13

30%

9

40%

7

50%

5

60%

5

70%

4

80%

4

90%

3

100%

3

What's the difference between 0% fuzziness and exact match?

There are several differences between 0% fuzziness and an exact match. Setting fuzziness to 0% only affects the edit distance matching behaviour described above.

Exact match also affects the following, in addition to setting fuzziness to 0%:

● Disables all pre-processing. For example, honorifics or suffixes like Mr, Ms, Dr or PhD are matched exactly.

● Disables all other forms of inexact name word matching such as equivalent names and phonetic matching except for word order variations and also-known-as (AKA) matching.

● Does not allow for extra words to be added. 'John Smith' won't match 'John Williams Smith'.

● Disables fuzziness for year of birth. When fuzziness is between 10% and 100%, we allow a one-year difference in year of birth. All of the above matching behaviours are enabled by default and are disabled by using the exact match setting.

What is the impact on false positives?

Name matching is inherently probabilistic and the options described above enable you to trade off greater aversion to the risk of missing inexact name matches against the operational impacts of a higher number of false positives.

To optimise this trade off, we have capped general edit distance matching at one character per name word. This allows for the overwhelming majority of spelling errors and typos while controlling the number of false positives. This does not mean that only single edit distance variations are considered matches. As noted above, we additionally use many other methods to match equivalent names, phonetically equivalent words, abbreviations, hypocorisms and more.

The matching algorithm has been tested extensively (both internally and by independent consultants) across different names and name variations in our database.

Higher fuzziness = Broader match

Fuzziness is a matching technique that allows for a variation in spelling or small variations in the spelling of a search term and the entities returned in the search results. The fuzziness will allow 1 type per each word from the search term

These typos can be adding, removing or replacing 1 character.

The percentage determines the minimum word length (number of characters) to allow fuzziness.

For example, if I search for the name “John Smith” and I set the fuzziness at 50%, this means the minimum word length is 5. As such, we would enable fuzziness on the word “Smith” but not “John” since it’s only 5 characters. So you may get matches like “John Smyth” or “John Smith”.

If I search for the name “John Smith” and I set the fuzziness at 70%, this means the minimum word length is 4. As such, we would enable fuzziness on both words. So you may get matches like “John Smyth” or Jon Smieth”.

Did this answer your question?