Achieving a Complete View of your Customer – Part 2

Combining Complex Data Systems

When we left off in “How to Combine Complex Data Systems Part 1” we had completed the customering process for a client, finding a single record to represent an individual, and assigning a key id, and were left with one task householding.

Householding can be a more complicated procedure. Once we have contact information, i.e. address, we can start to group customers based on physical address (this is only for retail customers; businesses are treated differently; more on that in a moment). This way we can identify Ben Armstrong and Ben Armstrong Jr as living at the same address; if we want to target households instead of individuals, then we can send a single marketing item to the head of household, rather than each individual. Like the accounts and parties, households are given keys to identify them.

While doing this initially is pretty straightforward, it gets complicated when we have new data. For example, John Smith and Mary live together in our system. If they move to a new address, do they get a new key, or do they keep their existing key? From a marketing point of view, it makes sense for them to keep their key, since the key is really just an arbitrary id used to recognize that they live together. However, what if they move into an address that already exists as belonging to someone else? What if John and Mary move apart; who should keep the key? Either one, neither? These were issues that were resolved by trying to keep end use in mind.  Our client wanted to use this new system for marketing.  Therefore, we continually focused on how to best set up the system to provide the most value to marketing.

Getting back to householding for businesses, that was more straightforward; each business individual was treated as its own household.

This process also incorporated new data, updating contact information, adding new customers, removing customers who left. This was more complicated, but ultimately boiled down to the original ideas, plus rules for managing changes (e.g. how should a particular record be handled if there’s a change?)

With all that we have achieved for our client, we now have the process running on a weekly basis, taking each new source system when it’s available, and running it through the process. At the moment, the process is largely automated, but still requires human intervention to kick off. This intervention involves multiple people for (usually) brief interactions, some of whom are working from different time zones, which, while minor, add up and take away productivity in other areas. Therefore, we’re currently working on completely automating the process.

Automation will be a multi-technology solution; SAS will be used to do most of the actual householding, but python and bash shell scripting will be used to completely automate the procedure, by monitoring the state of the data, and kicking off the process when it’s appropriate to execute.

Once this is done, we’ll end up with a marketing database that requires very little human interaction in maintaining, and will just be a continuously updated database, with the latest information always added to it, which is pretty exciting.

Learn more about “The Corios Way“, or reach out to to find out how Corios can help.

Achieving a Complete View of your Customer – Part 1

Combining Complex Data Systems

Often, Corios is hired by clients to handle the challenge of combining data silos.  You might be asking, why our client wouldn’t do this themselves? They have all the raw data, so they should be able to combine it appropriately and have their own system in place. For one of our clients, that was true, they did in fact have a customering process (within a single system); however, this system was incomplete as there were a few products that were not integrated. The purpose of their customering process was focused solely on billing, and what they were interested in implementing was a system focused on marketing activities. Additionally, the existing customering process was manual, and something that only happened when a customer began a new relationship with the bank; with no assurances that items were accurate and up to date.

The central idea was to identify people across multiple systems who are in fact the same person. So what we needed was a ground up procedure that could ingest raw data from multiple sources, tease out which individuals are in this amalgamation, and which records we should use to identify the attributes these individuals have. It sounds pretty straightforward, but there are a lot of challenges involved in these steps, which required a few clever ideas, and multiple steps, to implement properly.

Step 1 – Getting the data.

This required working with the different owners of the data sources, to ensure the data is correct and that it is staged properly. Additionally, we needed to make sure that we understood the data available to us (e.g. what does ‘A1’ mean as a ‘Customer Product Code’? Does it mean the same thing across different systems?).

Step 2 – Generating Match Codes

As mentioned, this can be complicated. The solution we’ve come to is, as early as possible, make the data uniform. For example, if one data source treats the name as a single field but another data source treats the name as multiple fields, then we need to make them agree. Additionally, all data types have to match (e.g. some systems treat social security numbers as numbers, while others treat them as character strings). By making the systems combine as early as possible, we could dramatically reduce the amount of work required had we not (e.g. if code had to be tailored to each system.) For our client, this step was crucial to laying the groundwork for eventual customer matching. To achieve this we use SAS Data Quality functions to return match codes (stay tuned for more on Data Quality Functions in a future post.)

Step 3 – Combining Data Sources

After we generate all of these match codes, we can put the data into a uniform form and combine all the data sources. At this point, we have the data, and it’s relatively simple to identify individuals, however we still face challenges based on data quality.

For example, is ‘Mike Smith at 123 Fake Street with SSN 123456789’ the same individual as ‘Mike Smith at 123 Fake Street without an SSN’? Maybe, but maybe it’s a father and son at the same address. For another case, what about two records that mostly match, but there’s a transpose in the SSN in one of the records? If there are ten cases of one SSN and a single case where the one SSN has a transpose, then it’s easy to decide that there’s most likely a typo in the last record. However, if it’s 1 and 1 (one good, one transpose), how do you know which one is correct? In more likely cases, we have records of individuals who are quite obviously the same person, but live at different addresses. How do we decide which address to use? This is especially important, as the purpose of this whole project is for marketing.

After speaking with the subject matter experts, we determined that some systems were more trustworthy than others, for various reasons (maybe one gets updated more often, maybe one is historically known to be less accurate, etc). So we took this into account, assuming records from these systems were more reliable, and within these systems, more recently opened accounts were probably more accurate than older accounts.

Eventually, we found a single record to represent an individual, along with contact info, and this individual was assigned a key id.

In my next post I will discuss the next step in achieving a complete view of your customer – householding.

Learn more about “The Corios Way“, and how we can help by visiting our website, or reaching out to

Agile Development at Corios

A constantly evolving work in progress.

In traditional waterfall software development, the delivering team takes a deliberately linear approach of defining the problem, developing requirements, design, development, test, and finally deployment.  Each phase in the process is intended to be complete before the next phase kicks off. Stakeholders have a clear image of how the end product will look, as well as continual insight into the project budget and timeline.

However, this process often breaks down, exceeds budget, and misses schedule.  This is due to discoveries made in the later stages that feedback to preceding stages.  For example, the development team has completed the development phase and begins testing.  During testing, the team uncovers unplanned functional and performance concerns that force the team to return to the design and development phases. Another common cause for rework throughout the process is waterfall does not account for the changing needs of the stakeholders.

In an ideal world, it’s fantastic to have a beginning-to-end mapped plan. But for software development, it has become clear that this process is not very effective in delivering on-time product within the planned budget.

As the landscape around software development began to expand in the 1990s the trend towards structured agile development increased.  Agile focuses on delivering a continuous stream of working software (“increment”) in shorter delivery cycles. One common way to think of agile development is it delivers a tiny section of the entire waterfall process in each cycle.

Agile even has its own manifesto that prioritizes:

  • Individuals and Interactions over Process and Tools
  • Working software over Comprehensive documentation
  • Customer collaboration over Contract negotiation
  • Responding to change over Following a plan

Here at Corios we concentrate on two similar agile methodologies, Scrum and Kanban, in delivering value to our clients.  In both methods “user stories” are entered into a common backlog.  Items can be added to the backlog as stakeholders/clients or members of the sprint team add to the requirements.  A story can also be removed from the backlog if it is no longer relevant or important to the project. The backlog is “groomed” at a regular frequency to stack rank all user stories and ensure the highest priority user stories land at the top of the backlog.

In Scrum, development is time-boxed at a pre-determined interval called a “sprint”.  In most cases, the sprint is 2-4 weeks. To start each sprint the team “plans” which user stories it will try to complete over the course of the time-box.  For each story the team identifies the tasks and dependencies required to complete the task. The team also determines the success criteria for each story by defining a definition of “Done” and sizes (“story points”) each user story added to the sprint based on difficulty.  The team tries to size the collective sprint activities based on the amount of work that can actually be accomplished during the sprint. At the conclusion of the sprint the team “reviews” its accomplishments by demonstrating and grading the sprint deliverables with stakeholder involvement and closes out the sprint. Before starting the next sprint, the team takes part in a “retrospective” to review the scrum process and identify areas that can be improved in subsequent sprints. Scrum works best with a small, multi-functional group of participants who are dedicated to the project. Each team member plays a specific role. Over the course of multiple sprints the self-learning sprint team identifies its velocity by determining the average number of story points the team is able to complete over the course of the sprint.

Kanban works a lot like scrum but without the scrum ceremonies including the time-box.  In Kanban, the team continuously builds a “To Do” list based on priorities.  Kanban strives to keep the amount of “Work in Process” at a lean level to ensure the team or individuals are focused on the right priorities while not overburdened.  Prioritization of the backlog helps ensure the most important items are picked up as other items complete.

At Corios, we tailor our agile methodologies around the client needs. We face a couple challenges working with client teams. The first challenge is most of our clients aren’t able to provide resources that are dedicated to the project and the second challenge is many of our partnering organizations have processes in place that aren’t conducive to continual delivery, evolving requirements, and adaptability. At the same time, as a boutique consulting firm with a diverse, expanding client base, we are internally challenged with strictly adhering to an agile methodology as we balance priorities with continuous delivery across our client portfolio.

In line with the agile principles, the Corios agile development is a self-learning, constantly evolving work in progress. The pace and comfort level of the client dictates the customized approach we take with each engagement.

For more about how agile develop is used at Corios, please send questions to Jason Doerflein

AWS reInvent 2017

Exponential Computing Capabilities to Solve New Problems

Along with 50,000 other people, I attended the Amazon Web Services’ re:Invent conference last week in Las Vegas. I was amazed at the energy, fresh thinking, professionalism, collegiality and thoughtfulness of the speakers and attendees. I was also amazed at the distances we walked, as the conference spanned at least 8 hotels along the Las Vegas Strip. One of my colleagues from Opus Interactive, a business partner of Corios, walked 16 miles in one day!

Like several other large tech conferences, AWS re:Invent included product announcements, inspiring speakers, light shows, and more than a few sponsored parties; though I have to admit I was in bed every night by 10pm, since it was a work week. Throughout the event, I was impressed with the cohesiveness of the AWS executives’ and product specialists’ message. This is similar to our experience working with AWS services: everything ties together, and it just works.

The energy and innovative spirit I encountered was a familiar sensation. It distinctly reminded me of the tech industry back in the 1980’s and 90’s, when computing was moving off the mainframe and on to the PC. We migrated cumbersome (if not enormous) workloads to desktop PCs and Mac’s, and took advantage of the freedom and openness of new ways of developing business solutions. The PC revolution meant that we could apply brand new ways of thinking about data and analytics, which started to include visualization, rapid application development, user interfaces and formal data models. It also meant we could move the place where we worked from the stuffy central office, out to somewhere more comfortable as long as we had a phone line.

Comparing the cloud revolution to the PC revolution, it strikes me that there are some very similar themes that a massively distributed, services-oriented architecture now enables: we can solve problems on an exponential scale; we can reward the innovation of the analyst who has a brilliant idea; we can embed analytics into virtually every data stream with ever-reduced friction.

Here are some important statistics I took away from the executive keynote message of Andy Jassy, the CEO of AWS:

  • Despite all the talk about leading-edge data structures, traditional relational data structures are still incredibly important. 80% of all data in the cloud resides in relational databases.
  • We are just getting started. Only 10% of enterprise workloads and data have been moved to public clouds.
  • The growth of cloud architecture adoption is massive. AWS operates an $18 billion annual run rate with an annual growth rate of over 40%.
  • Despite all the investment in the cloud platform, there is still a ton of work to be done in order to operate our business processes on the cloud. For every 1 dollar invested in cloud architecture, there are probably 8 more dollars to be invested in design, migration, and operation, not to mention investment poured into new innovation.
  • The most important reference cases for cloud adoption thus far are cost savings, but business value statements are soon to emerge. References for IT cost savings in compute and storage in the range of 20%, 30%, 50% are common.
  • I mentioned that roughly 50,000 people attended the re:Invent conference at large, but of that universe, I’d roughly estimate that over 4,000 attendees were in the Global Partner Forum alone. Check out this video I took of the keynote session at the MGM Grand hotel.



My personal takeaway is those of us focused on business value need to leverage the exponential computing capabilities of the cloud to solve new problems which will produce bottom line revenue increases and business risk mitigation; through portfolio optimization.

Our Philanthropic Principle

“We give back because we can."

At Corios we are extremely thankful for Emily.  Emily’s leading principle is “We give back because we can.” Leading with this no nonsense and 100% correct philosophy, she combines common sense and empathy, with a little bit of humor, and achieves results for those in our community (and the world) who need it most.

Today has been dubbed by the “powers-that-be” as “Giving Tuesday.”  The general idea being, that the most appropriate thing to follow the gluttony and commercialism of Thanksgiving, Black Friday and Cyber Monday, is a day to focus minds (and money) on helping those who need it most.

This morning, Emily greeted our team with a little comical “truth.”

“Giving Tuesday is definitely a hokey, phony, marketing- created, event like Arbor Day or National Donut Day. But, if it gets people who normally wouldn’t think of donating to a charity to open their wallets, then I’m behind it.”

With helping firmly in mind, Emily’s message promptly focused on a few direct calls to action. In addition to calling out several big name charities such as NPR and the WWF, she also highlighted a few lesser known opportunities to give back such as Give Well and Give Directly. (Looking for somewhere to give back today? It only takes a couple minutes.)

So, with this none too subtle reminder in effect, I knew today was the perfect day to contribute to my team’s efforts to win Corios Penny Wars.  As mentioned in my previous blog post “Giving Thanks”, Penny Wars is our annual fundraiser for the Oregon Food Bank.  The idea is for each team to earn points through the contribution of pennies, while decreasing the points of other teams through the contribution of other coins and bills.

For Example:
100 Pennies = 100 points
1 Dollar Bill = -100 points

Obviously the end result = Lots of Money for Oregon Food Bank.

And so, off to the bank I went, in search of pennies (lots and lots of pennies.)

Did you know $25 of pennies come in a convenient carrying box?

For the record, I’m going to say I left the pennies in the box as a favor to Emily who counts the totals weekly.  However, in truth, I like the perceived image of my team “standing” on a first place podium.

Whether your company has sponsored a formal holiday giving project, or you just take a few minutes to visit Give Well or Give Directly, be sure to seize today’s opportunities to help out those who need it most.