Match Criteria Configuration

To set up the match criteria for a match algorithm, the Matching Component Model must be populated. The user needs to know the data that are to be matched. One tool for such analysis is Data Profiling.

Matching Component Model

Before match codes can be generated and matching algorithms applied, the Matching Component Model must be configured. The Component Model determines which objects, attributes, and references are relevant to the configuration and how they apply.

Note: Additional Component Models must be configured for certain Match Actions. For more information, see the Match Codes topic in this documentation here.

All relevant Object Types, Attributes, and References must be created before they can be mapped to the component model.

The Matching Component Model defines all Objects Types that are allowed to be matched.

  1. In System Setup, expand 'Component Models,' and click on the 'Matching' node.
  2. On the 'Component Model Configuration' tab, click the Edit link.

  1. Click the 'plus' button for the relevant component aspect to display the selection dialog, and then choose to add an object, attribute, or reference:

Click the 'X' button to remove the relevant object, attribute, or reference from the component model. A green checkmark will appear if the applicable row has a valid configuration.

  1. Click Save to save changes.

Note: If you need to navigate away from the configuration dialog and some of the rows are not yet valid (they have an 'X' instead of a checkmark), click Save pending to save your work.

Data Profile Analysis as Preparation for Match Criteria

Designing a deduplication strategy requires an intimate understanding of the data, and to that end, STEP Data Profiles can be of great assistance. Data profiles show the extent to which relevant attributes are populated and highlight the most frequent and rare values and patterns. For more information, see the Data Profiling documentation here.

If a profile is generated from the 'External Products' node, it is possible to see that there are missing values for both OEM and OEM Part Number. This ability to highlight missing values should be accounted for in the deduplication strategy. Furthermore, as illustrated below, the profile shows that the OEM values include obvious duplicates like 'Craft Parts' / 'Craft parts' and 'Weller' / 'WELLER INC,' indicating that some form of normalization is required.

For OEM Part Number, there are more than one hundred distinct values, and thus, the profile does not provide exact statistics with the default settings. Still, it is possible to see that both uppercase and lowercase letters are used, and that punctuation is used in some values and not in others. Again, this indicates that normalization will be required.

Notice, that when looking at the frequent patterns info, there are no clear, distinct patterns in the values.

With two 'matching' attributes, it would be possible to generate two match codes per object, but for this case, this is likely not the best strategy because the number of different OEM values is quite low, especially if they are normalized. Further, comparing all items from the same OEM would result in too many comparisons.

As there are a significant spread in OEM Part Numr values, generating match codes based solely on these values could work. Additionally, the OEM value should be used as a basis for match since a specific OEM Part Number value pattern to an OEM cannot be be assumed. For example, a match on OEM Part Number is not necessarily a true match as these values are reused. However, this approach would require that the matching algorithm logic inspect the OEMs later to determine if there was a match or not.

A possible solution is to generate composite match codes that include information from both attributes. Suppose the values are normalized during the match code generation. In that case, it will be possible to simplify the setup so that identical match codes are automatically considered a match. This strategy can be achieved by working with a Window Size of one, which only compares objects with the same match code, and the matching algorithm logic does not check anything, but it indicates a match for each comparison.

Matching Component Model

Before match codes can be generated and matching algorithms applied, the Matching Component Model must be configured. The Component Model determines which objects, attributes, and references are relevant to your configuration and how these configurations are used.

Note: Additional Component Models must be configured for certain Match Actions. See Match Actions

All relevant Object Types, Attributes, and References must be created before they can be mapped to the component model. The Matching Component Model defines all Objects Types that are allowed to be matched.

  1. In System Setup, expand 'Component Models,' and click on the 'Matching' node.
  2. On the 'Component Model Configuration' tab, click the Edit link.

  1. Click the 'plus' button for the relevant component aspect to display the selection dialog, and then choose to add an object, attribute, or reference:

Click the 'X' button to remove the relevant object, attribute, or reference from the component model. A green checkmark will appear if the applicable row has a valid configuration.

  1. Click Save to save changes.

Note: If you need to navigate away from the configuration dialog and some of the rows are not yet valid (they have an 'X' instead of a checkmark), click Save pending to save your work.