Many a times we may have to find potential duplicates in the data and correct it so that correct and harmonized data can be transferred to the target system.
During ETL process we might have to find and remove duplicate records to avoid data redundancy in the target system.
Data Services has two powerful transforms that can be used for many scenarios. The Match and Associate transforms under Data Quality.
These 2 transforms in combination can do lot of data quality analysis and take required actions. In this part we will just see how to use Match transform to identify duplicates in address data and eliminate them.
In next tutorial, we shall see how to post correct data back from duplicate record on to the original driver record.
The sample process that I used is demonstrated in below video.