The Matching, Linking, and Merging functionality relies on four underlying components: Match Codes, a Matching Algorithm, an Event Processor, and an Inbound Integration Endpoint. Together, these components can evaluate a dataset for duplicate objects and initiate necessary maintenance actions. To accomplish this, match codes are generated for objects in a dataset and the matching algorithm evaluates those match codes for potential duplicates.
How those duplicates are found and handled depends on how the matching algorithm is configured. The main three methods of handling duplicates are:
The event processor triggers events which can be acted upon by a business rule or an outbound integration endpoint.
A match code is essentially a string (i.e., a text) representing an object. The string is derived using either STEP functions or JavaScript (drawing upon the functionality exposed in the public Java API). As an example, a match code could simply be composed of the values of two attributes concatenated.
Once generated, these match codes populate an alphabetically sorted table in the system. Rather than comparing every object with every other object in the dataset, only objects with match codes close to each other in the sorted list will be compared. This dramatically limits the number of comparisons required as they are linearly proportional with the number of objects. The number of comparisons an individual object can make is based on the Window Size setting.
Using the data from the image above, with a window size of '3', every product object would be compared to the object with the match code immediately prior to / following it in the list – the exception being the two products for which identical match codes are generated ('Item-542145' and 'Item-548456'). As the window size setting is for unique match codes, 'Item-542145' would be compared to both 'Item-548751', 'Item-548456' and 'Item-548541' while 'Item-548456' would be compared to 'Item-548751', 'Item-542145', and 'Item-548541'.
Disregarding that match code values can be identical, the function for calculating the number of comparisons with this approach can be approximated to:
Total comparisons required = ((Window Size -1) / 2) * Number of objects
With this approach, you have to be very careful when defining the match code. Objects with match codes that are alphabetically far from one another are not likely to be compared (unless the window size is set very high, which then defeats the purpose).
For more information on match codes, things to consider before configuring match codes, and how they are configured, see the Configuring Match Codes documentation here.
Match codes simply limit the number of comparisons that the matching algorithm has to make, and once the codes have been generated, the matching algorithm can compare objects in the dataset. Comparing two objects results in a number between 0% and 100%, indicating how similar the objects are to each other.
There are many ways to arrive at this metric. In some cases, you could only be interested in exact matches, and the algorithm would be fairly straightforward. For example, if the social security number for two customer objects is the same, then it is very likely these are duplicates and the matching algorithm should return 100% (or 0% if the numbers are not the same).
In many cases, however, you cannot work with exact matches, but instead, will have to deal with approximate matches, or a combination of exact and approximate matches. It could be that for the customer object mentioned above, you don’t have a social security number available and will have to identify duplicates based on names, mail addresses, phone numbers, and street addresses. These pieces of data can have variations, even in objects that represent the same real world entity. Names and addresses could be spelled differently, middle names could be left out, abbreviations could be used in names and addresses, the customers could be registered with different phone numbers or mail addresses, and so on.
For more information on matching algorithms and how they are configured, see the Configuring Matching Algorithms documentation here.
While the match code and matching algorithm define the data and handling that is needed, an event processor is required to launch the associated business rule or OIEP.
Once set up, an Event Processor can be configured to monitor the system for actionable events on specified objects and then regenerate match codes and/or run matching algorithms in response. For example, consider an object that is subject to a matching algorithm. When the match code assignment or data on that object is approved, the approval can trigger the event processor to regenerate the match code for that object and run the algorithm. Alternatively, events could be passed to the event processor via a republish business rule as part of a workflow or integration.
Event processors keep a background process log, so you can determine when events were processed and what actions were taken in response. Additionally, event processor performance measurements are available on the Statistics tab for both Matching Algorithms and Match Code configurations.
For more information on matching event processors and how they are configured, see the Configuring Matching Event Processor documentation here.
For the Identify Duplicates and Link Golden Record solutions, IIEPs are used to get data into the system. During the import process, the matching process can be invoked by a business rule.
For more information on configuring an IIEP for Identify Duplicates and Link Golden Record solutions, see the IIEP - Configure STEP Importer Processing Engine section of the Data Exchange documentation here.
In a Match and Merge configuration, the IIEP is responsible for importing source record values, matching the source records to existing golden records, and merging the surviving source record values into the golden record. Once configured, the IIEP will respond to incoming data, and will evaluate the inbound records against any existing golden records in STEP. If matches are found, those records will be merged together, and any records that fall within the clerical review threshold will be automatically initiated into a clerical review workflow (after being processed by the event processor).
For more information on configuring an IIEP for Merge Golden Record solutions, see the IIEP - Configure Match and Merge Importer section of the Data Exchange documentation here.
2018, Stibo Systems