Entity Resolution in Cross-Border Data Sharing

Blog

August 28, 2022

Entity Resolution is one of the key steps preceding and defining the quality and accuracy of data sharing. Entity Resolution is aimed at identifying and linking together different data records that refer to the same real-world entity. However, every “real-world entity” may have different representations and may be defined by a single (weak) or multiple (strong) identifier(s).

For example, a “real-world entity” male by the name Peter Smith may be identified by his name. But there are other Peter Smiths in the world, and, thus, it’s not enough information to resolve the entity.

But once we can cross-reference a Peter Smith with his address and/or phone number, and/or e-mail address, and/or any other identifier, the entity may be accurately resolved.

The task is complicated by the fact that all these data records may come from multiple data sources that differ in their structure and format. The records may include broken or incomplete data, spelling and structural variations, and so on. This is getting even more challenging when dealing with data sharing on the global scale.

Cross-border data sharing is paramount for detecting money laundering, terrorism financing, and other illegal financial operations. The very nature of this type of criminal activities is global, and as such, they pose serious challenges on several levels. Below are just some of them:

Complexity of database structures
Differences in Legislation (including privacy regulations and requirements)
Multilingual environment

Complexity of database structures

Big data is being kept, processed, and frequently updated differently even in different branches of the same organization. The issue of incompatible data sources poses challenges even at the national level and becomes a major obstacle when cross-border data sharing and/or processing is required. Differences in data structures and formats require to carry out data processing and unification to enable meaningful data sharing. Not to mention the variety of database systems (both advanced and legacy systems) utilized by various institutions and organization in both private and public sectors.

Big Data, Fintech, and Regtech service providers each offer solutions for data unification. Some of them are very advanced and promising. However, to be effective and to have real practical value, these solutions need to consider other aspects, besides the variability of database structures and formats. E.g., privacy issues, time of processing, variety of languages, and others.

Differences in Legislation

The major problem of data sharing, both locally and globally, is the problem of privacy. How to ensure privacy without compromising the value of data sharing? Usually, we talk about Privacy issues when applied to a data breach and exposure of personal information. However, the very idea of data sharing is the exchange of information, in most cases, personal information. So, the question is: how to allow this exchange of information without violating privacy? The most common approach is to encode data. But then, to make the information sharing meaningful, this data needs to be decoded, which creates a vicious circle of privacy breaches.

The problem is aggravated even further by privacy legislations that differ dramatically from country to country. The US and EU have attempted to draw a baseline for privacy regulations to enable meaningful data sharing for quite a while. Up until recently The Privacy Shield Framework was the main privacy protection regulation governing the US – EU cross-border data exchange. However, more tight and strict EU privacy policy invalidated the Privacy Shield due to “invasive US surveillance programmes, thereby making transfers of personal data on the basis of the Privacy Shield Decision illegal” (The CJEU judgment in the Schrems II case).

Only on March 25, 2022, the US and EU came to an agreement on a new set of data privacy regulations summarized in Trans-Atlantic Data Privacy Framework. However, the provisions of the agreement come with considerable limitations as of what types of information and to what extent may be transferred.

Considering the limitations of this pact, it become evident that no meaningful data sharing aimed at fighting money laundering and terrorism financing on a wide scale is possible. It will be neither timely nor efficient.

And this is before considering other players of the global marketplace: Asia, Middle East, Africa, South America, Australia & New Zealand, Russia and many East European countries are not even considered, so that no information may be shared with over half of the world without violating the US and EU privacy regulations.

Privacy Enhancing Technologies (PETs) are being developed to address the challenge and enable cross-border data sharing. However, since “137 out of 194 countries have put in place legislation to secure the protection of data and privacy – and each nation’s policy is unique” (from Privacy, regulations and cross-border data sharing in finance), which makes it utterly impossible to detect financial crimes and “follow the money” at the international scale.

Multilingual Environment

Another challenge is sharing data originated from or kept in national data silos in different languages. One of the typical and the most challenging examples of such data sharing is international money laundering and terrorism financing regulations.

To follow the money trail, one of the most important issues is to identify and match the names of individuals and/or companies that are the parties to the transfer. The trails of both legitimate and fraudulent transfers may be long and complex, passing borders of several countries. Each one of the parties from different banks on different continents must be screened and verified against various sanction lists, while each data source has not only its own structure and format, but also its own language. The problem is that very few of the existing sanction screening solutions are capable of recognizing names when spelled in their original alphabets (for the exception of Latin, of course). However, even when spelled in Latin characters, the spelling may differ based on the name pronunciation and phonetic rules of each language. The common practice of solving this issue is to use transliterated names, which leads to even greater challenges and inconsistencies, since there are no strict rules of how to spell transliterated names. And how about shortened or just misspelled names? Or unstructured names, when 3-4 components of one name switch places? Current systems struggle to resolve these issues. As a result, some of sanctioned entities are being missed, leading to multimillion dollar fines for non-compliance, while many are being false alerted, creating inconvenience for the clients and reputational damage for the entity.

———————–

Fincom’s has revolutionized cross-border data sharing by introducing its advanced Entity Resolution and AML Compliance solutions. By deploying 48 phonetic-linguistic and mathematical algorithms in conjunction with its core Phonetic Fingerprint technology, Fincom’s advanced solutions make it possible to search, match and process data in different languages in their original alphabets across various data silos of different structures and formats without violating the strictest privacy regulations.

Read more about the Technology at the base of Fincom’s Solutions

Latest Blog Posts

News, Blog

February 24, 2024

OFAC requirements for screening Beneficiaries, Instant Payments, and ACH Transactions

Here are answers to some questions that have recently been raised by regulated entities with regards to OFAC compliance requirements....

News, Blog

February 21, 2024

Global affairs are pushing regulatory requirements towards a universal need for multilingual sanctions screening

Historically, sanction lists contained names transliterated into Latin characters. Recently the common practice took a turn, and currently most sanction...