Achieving a Complete View of your Customer – Part 1

Combining Complex Data Systems

« Back to Blog

Often, Corios is hired by clients to handle the challenge of combining data silos.  You might be asking, why our client wouldn’t do this themselves? They have all the raw data, so they should be able to combine it appropriately and have their own system in place. For one of our clients, that was true, they did in fact have a customering process (within a single system); however, this system was incomplete as there were a few products that were not integrated. The purpose of their customering process was focused solely on billing, and what they were interested in implementing was a system focused on marketing activities. Additionally, the existing customering process was manual, and something that only happened when a customer began a new relationship with the bank; with no assurances that items were accurate and up to date.

The central idea was to identify people across multiple systems who are in fact the same person. So what we needed was a ground up procedure that could ingest raw data from multiple sources, tease out which individuals are in this amalgamation, and which records we should use to identify the attributes these individuals have. It sounds pretty straightforward, but there are a lot of challenges involved in these steps, which required a few clever ideas, and multiple steps, to implement properly.

Step 1 – Getting the data.

This required working with the different owners of the data sources, to ensure the data is correct and that it is staged properly. Additionally, we needed to make sure that we understood the data available to us (e.g. what does ‘A1’ mean as a ‘Customer Product Code’? Does it mean the same thing across different systems?).

Step 2 – Generating Match Codes

As mentioned, this can be complicated. The solution we’ve come to is, as early as possible, make the data uniform. For example, if one data source treats the name as a single field but another data source treats the name as multiple fields, then we need to make them agree. Additionally, all data types have to match (e.g. some systems treat social security numbers as numbers, while others treat them as character strings). By making the systems combine as early as possible, we could dramatically reduce the amount of work required had we not (e.g. if code had to be tailored to each system.) For our client, this step was crucial to laying the groundwork for eventual customer matching. To achieve this we use SAS Data Quality functions to return match codes (stay tuned for more on Data Quality Functions in a future post.)

Step 3 – Combining Data Sources

After we generate all of these match codes, we can put the data into a uniform form and combine all the data sources. At this point, we have the data, and it’s relatively simple to identify individuals, however we still face challenges based on data quality.

For example, is ‘Mike Smith at 123 Fake Street with SSN 123456789’ the same individual as ‘Mike Smith at 123 Fake Street without an SSN’? Maybe, but maybe it’s a father and son at the same address. For another case, what about two records that mostly match, but there’s a transpose in the SSN in one of the records? If there are ten cases of one SSN and a single case where the one SSN has a transpose, then it’s easy to decide that there’s most likely a typo in the last record. However, if it’s 1 and 1 (one good, one transpose), how do you know which one is correct? In more likely cases, we have records of individuals who are quite obviously the same person, but live at different addresses. How do we decide which address to use? This is especially important, as the purpose of this whole project is for marketing.

After speaking with the subject matter experts, we determined that some systems were more trustworthy than others, for various reasons (maybe one gets updated more often, maybe one is historically known to be less accurate, etc). So we took this into account, assuming records from these systems were more reliable, and within these systems, more recently opened accounts were probably more accurate than older accounts.

Eventually, we found a single record to represent an individual, along with contact info, and this individual was assigned a key id.

In my next post I will discuss the next step in achieving a complete view of your customer – householding.

Learn more about “The Corios Way“, and how we can help by visiting our website, or reaching out to president@coriosgroup.com.


Robin Way

The Founder and President of Corios, Robin’s professional passion lies in democratizing and demystifying the science of applied analytics. An established thought leader fueled with 30 years’ experience in the design, development, execution and improvement of applied analytics models, Robin welcomes every opportunity to move the analytics conversation forward.

Connect with him on LinkedIn , or reach out to Corios to get in touch.