Individuals and organizations rely on data for analysis, forecasting, decision making in a variety of fields, including scientific research, finances, banking, insurance, marketing and e-commerce, healthcare and so on.
However, in the reality of the imperfect world we happen to live in, data is often not reliable or useless due to incompatibility between different database formats that vary from one data source to another, numerous instances of broken data, incomplete entries or duplicates, dummy records…
Entity Resolution, Record Matching, Fuzzy Matching, Merge Purge, Data Matching… different names that refer to processes aimed at identifying and linking together different data records that refer to the same real-world entity. These data records may come from a single data source or be collected from multiple data sources. It’s important to mention, that every data source usually has its own structure and format. Therefore, in order to unify data, it is important to standardize it first, to create a common format for all these disperse data sources as a whole and for all and every data record.
Entity Resolution involves:
- Data identification and recognition
- Data must be identified according to known parameters (e.g., names, dates, addresses, amounts with a certain formatting, etc.).
- The identified data entries need to be standardized and enhanced: all names need to be reduced to a common denominator (i.e., the same order and format of name components), all sets of numbers need to appear in the same format, and so on.
- It’s pivotal to ensure that the data is “healthy,” i.e. clean, standardized, compliant to all rules
- Resolution of data
- At this stage the identified entries within a single data source or across various data sources are compared in order to determine whether and what entries refer to the same or related (when required) real world entities.
- Result generation
- Depending on the original requirement and the primary purpose of resolving the entities
Why is Entity Resolution important?
Almost all publications on Data mention infamous rule formulated by George Labovitz and Yu Sang Chang who formulated the 1-10-100 rule for data costs, saying that:
$1 – cost of verifying a record at the point of entry,
$10 – cost of cleansing and deduplicating within the database, and
$100 – cost of doing nothing, i.e. cost of damage caused to the company by not resolving data entities.
Let’s take it a bit further:
What are the costs and implications:
- of mistakes resulting from using faulty data,
- of having no or fragmented data,
- of lost transactions, errors, non-compliance…
the list is long.
Fincom’s technology provides cost-effective automations and solutions for Entity Resolution for organizations, financial institutions, and governmental agencies, helping to solve all of the above issues. It is traceable and transparent, ensures better customer management, anonymous search, data integration, fraud prevention, and more.