Duplicate Check

Organization duplicate check works on the following criteria.

Both of these criteria must be met:

  • SOUNDEX similarity score greater than three on Organization Name. In other words, the organization name must be similar. See SOUNDEX for an explanation.

AND

  • Exact match on EITHER the Main Address City and StateOR the Billing Address City and State

SQL

From a technical perspective, the duplicate check works with Duplicate Check Child Form. The child form SQL code is below:

SELECT org_cst_key,
[Organization] = isnull(cxa.cxa_mailing_label_html,org_name)
FROM co_organization (nolock)
JOIN co_customer (nolock) ON cst_key = org_cst_key
LEFT JOIN co_customer_x_address (nolock) cxa ON cxa.cxa_key = cst_cxa_key
LEFT JOIN co_address (nolock) adr ON adr.adr_key = cxa.cxa_adr_key
LEFT JOIN co_customer_x_address (nolock) cx2 ON cx2.cxa_key = cst_cxa_billing_key
LEFT JOIN co_address (nolock) ad2 ON ad2.adr_key = cx2.cxa_adr_key
WHERE difference(org_name, {org_name} ) > 3 AND ((adr.adr_state {=adr_state}
AND adr.adr_city {=adr_city}) OR (ad2.adr_state {=ad2__adr_state}
AND ad2.adr_city {=ad2__adr_city}) )
AND org_cst_key {<>org_cst_key}
AND NULL {=org_cst_key}
AND cst_delete_flag=0

Clauses in where

Here is an analysis of the specific lines in the WHERE clause of this SQL:

--The difference clause below will check for all of the organizations in the database where the soundex of the organization names is 
----the same as the soundex of the new organization name.
 
difference(org_name, {org_name} ) > 3
 
--For a clearer understanding of the explanations below, we will insert the results of the query
----above into Table RESULT#1 (with only the statement above in the where clause.)
 
--The clause below reduces the results further to where the organizations in RESULT#1 have to have the same city and state as the ones we add.
 
AND ((adr.adr_state {=adr_state} AND adr.adr_city {=adr_city}) --This clause will compare the main address.
OR (ad2.adr_state {=ad2__adr_state} AND ad2.adr_city {=ad2__adr_city})) --This clause will compare the billing address.
 
--Notice that there is an OR in between the 2 clauses above which mean that if the new organization does not have a billing address,
----and one or more organizations in the RESULT#1 do not have a billing address and have different primary address, they will not be eliminated.
----The same idea applies for the primary address.
----For a clearer understanding of the explanations below, we will insert the results of the query above into Table RESULT#2 (with only the statements
----above in the where clause.)
 
--The statement below will check organizations in table RESULT#2 to make sure they do not have an org_cst_key that is equal to the new org_cst_key.
----100% of the time, the statement below turns out to be "and org_cst_key is not null" when you add an organization.
 
AND org_cst_key {<>org_cst_key}
 
--This check to make sure that the new org name does not have an org_cst_key. which mean that this whole SQL script only works when you add an organization.
 
AND NULL {=org_cst_key}
 
----For a clearer understanding of the explanations below, we will insert the results of the query above into Table RESULT#3 (with only the statements
----above in the where clause.)
 
--this will reduce the results in table RESULT#3 further and eliminate any organization that has been soft deleted.
 
AND cst_delete_flag=0

Adding Clauses to Improve Duplicate Check

The majority of the time the script above is not very accurate and we have a couple client asking on how they can improve it. So, I did a further investigation on this and below are a couple way you can improve the script above.

Soundex of the Second Word of the Organization Name

We can add another Difference clause just like the first one but this time we compare second words of the org name instead of their whole name.

AND DIFFERENCE (
SUBSTRING(SUBSTRING(org_name, CHARINDEX(' ', org_name) + 1,1000), CHARINDEX(' ', SUBSTRING(org_name, CHARINDEX
(' ', org_name) + 1,1000))+ 1,1000),
SUBSTRING(( SUBSTRING({org_name}, CHARINDEX(' ',{org_name}) + 1,1000)), CHARINDEX(' ',( SUBSTRING({org_name},
CHARINDEX(' ',{org_name}) + 1,1000))) + 1,1000)
) > 3

You can repeat this process for the third word and so on. You can also add up the difference of the values of the difference if you like.

Length of the Name of the Organization

We can also compare the length of the organizations to make sure their difference is not more than a certains amount of characters.

AND ABS(LEN(org_name) - LEN({org_name})) < 5

You are free to decide on the number characters. The smaller the number, the more accuracy the script will be but if it is too small, then the whole script may be useless.