A Tale of Data Most Terrifying

When good data goes bad.

It’s Halloween, and in honor of this spookiest of all holidays, Corios Consulting Director, Eric Flora, spins a tale of good data, gone bad.

Late at night thoughts tend to wander, especially when you are still at the office on Halloween.  I was surrounded by the mundane elements of the workplace – desks, phones, and computers.  But, on this night, something felt different despite the familiar setting.  I imagined the presence of a parallel reality that would physically manifest our thoughts and emotions, and pondered how on Halloween that shadow world seemed so much closer to our own.

I willed myself out of this reverie and back to the task at hand.  I had been given a jumble of customer contact information from a variety of data sources, and asked to make sense of it.  Column names were deceitful.  Formatting was chaotic.  Records from different systems either weren’t connected, or when they were, they didn’t agree.  It was like a giant living blob with many mouths, each keening a discordant note of helplessness, with no common theme except that the end will be soon.

Whoa, where did that come from?  Maybe I should call it a night.  But, two things kept me in my ergonomic chair.  First, pride in my ability to simplify the complex, and second my fascination at the wretched state of the data, and the challenge it represented.  The soft amber glow of the coffee maker calmed me, and I set to work.

The disconnection between the column names and the data they contained was the first bugbear to face.  Setting aside the spurious labeling, I wrote a program to identify the data represented by the values in each column, for each row.  This cell is a name, this one was part of an address, another a phone number, the rest of the address, and email address. The data types of each token, their number and sequence, and whether they appeared in a knowledge base all helped to identify and categorize each datum.  The blob began to divide, and slimy hobgoblins of different shapes and sizes emerged from the effluent, dancing to a shrill piping scale and tittering as they locked eyes with me.

I jerked to alertness and ran a hand down over my face.  Easy self, this is just customer data.  Ok, I’ve identified what these data represent, what’s next?  The image of the hobgoblins lingered in my mind.  Each of them was different; here three arms, here hopping on one thick leg, here no head with a face peering out of its chest.  Ah, I need to standardize these data.  Let’s see, proper case or upper case?  Should I use Street or St.?  Zip codes, 5 digits, no 9 digits, hyphen or no hyphen, 6 alphanumeric for Canada, ok, that is starting to look alright.  The hobgoblins have grown still and solemn and each now has the normal complement of limbs and neck and head.  A group of them have a whispered argument, and then one meekly steps towards me to request distribution of cups of coffee for him and his fellows.

I am standing at the coffee maker, pot in one hand and cup in the other, before I recall that I am alone in the office.  In an absurd, but somehow necessary, attempt to save face I pour and take a sip of the coffee as if this was my intention all along.  Returning to my desk, now that the data is in better shape, I can start to compare the data in aggregate from the different data sources.  Argh, Jim here and James there, the same person to my mind but multiple ids within the same system, miss-keyed SSNs, Jr. and Sr. at the same address.  Robert and Roberta are different people (although if you asked them they would say they are soulmates).  The same person represented in different ways within and across the data sources reminded me of the mythological Hydra with many heads attached to one body, slithering out of the water with venom on its teeth and venom in its heart.

Ouch! I say aloud as my forehead impacts the computer monitor.  Ok, time to wind this up.  I can make groups of names that are essentially synonyms according to my knowledge base, a middle name and a corresponding middle initial can be equivalent, the edit distance will help identify misspellings, and then I can try soundex since it looks like some of these names were keyed in after being spoken over the phone.  I trust this data source more than the others, so I’ll use it if possible to represent the group, then the next most trustworthy, and so on.  Then, I assign a surrogate key to each of the groups and the hydra sinks back into the water, steaming at its defeat.  No more monsters, just a collection of ordered data that corresponds with real people in the real world.

I turn off my monitor, grab my jacket, and lock the door on my way out.  If I hurry, I can still make it to the costume party before the clouds clear and the full moon shines down with its transforming light.  I know I’ll be very hungry soon, and there will be plenty of food there.  Wait, where did that thought come from?

To our fellow Data Scientists, Happy Halloween!  May you slay your own data monsters, and live to tell the tale. And, to our fellow business owners, if your data resembles the monsters described here, shoot us an email, we can help.

Entertaining Machine Learning Failures

Proof that robots won't be taking over any time soon.

Unless you’ve been living under a rock, the question of “Humans vs Machines” is not new to you.  Although the volume may have recently increased due to technological advances, the story has been told over and over again, in a variety of different ways.

At an IIA conference, Jerry Kaplan of Stanford University, a leading faculty member in the domain of machine learning, suggested that the media often characterizes AI and machine learning as magic, which Kaplan holds couldn’t be further from the truth.  Machine learning instead is rooted in traditional domains like neural networks and recommendation systems. Despite that, I agree with Kaplan. Sometimes, the media is looking for a great story.

With that in mind, let’s jump start the weekend with a collection of AI failures, which make for really funny stories:

Judah vs the Machines:

Check out this series of short videos by comedian Judah Friedlander, as he represents “humanity” in various AI technology challenges.

https://techcrunch.com/2017/05/22/judah-vs-the-machines/

An AI invented a bunch of new paint colors that are hilariously wrong:

This is the perfect example of why the human mind is necessary in creative endeavors.

https://arstechnica.com/information-technology/2017/05/an-ai-invented-a-bunch-of-new-paint-colors-that-are-hilariously-wrong/

InspiroBot Generates Random Inspirational Images:

Inspiring? No. Entertaining? Most certainly!

https://techcrunch.com/2015/08/14/inspirobot-generates-random-inspirational-images-to-make-you-seem-deep-and-maybe-a-little-crazy/

“Silicon Valley” and “Well-Defined Domains”:

Although fictional, this instance is certainly hilarious, and 100% plausible. From the show Silicon Valley, the character Jian Yang’s develops a phone app for recognizing “hot dog; not hot dog”. Character Earlich Bachman is thrilled at what he perceives as Jian Yang’s invention of a recognition engine for sorting any type of food, when in fact the domain was much smaller and far more focused. For those who are not fans of the show, turns out the application has a useful purpose after all. (Watch the show! It’s the second last episode from this season. Caution… NSFW)

Good or evil, helpful or harmful, humanity’s savior or downfall. Regardless of what side of the AI fence you sit on, we can all agree, AI will be a source of entertainment for years to come.

Should Data Scientists be threatened by AlphaGo Zero?

Google unveils new AI that can "create knowledge itself"

The discussion amongst leading scientists concerning the cultural and social implications for artificial intelligence (AI) is in the daily news, resulting in a public, intellectual debate over whether AI will be a harmful, helpful, or benign influence on society. Notably among others, Stephen Hawking and Elon Musk represent the concerned thought leaders, and Mark Zuckerberg and Jeff Bezos advocate the benefits of continued AI investment.

In support of those advocates, humans face challenges in making efficient and rational decisions, specifically when a risky decision is made by an individual expert. Though we wouldn’t have conceived of this possibility five years ago, it’s now not too extreme to consider whether AI-enhanced decision-making platforms might address these challenges, leading to better outcomes in risk-laden situations like medical surgery, litigation, psychotherapy or military job placement. Recommended reading: Undoing Project by Michael Lewis; Never Split the Difference by Chris Voss.

I believe areas where machines are wonderful partners include:

  • 24-hour execution
  • Repetitive and tedious tasks
  • Tasks requiring rapid, detail-heavy calculations
  • Robust operational environments: because error adjustment is complicated
  • Detail orientation
  • Storage and retrieval
  • Collecting information from a large group of people in distributed manner

Areas where people are still necessary (and one might argue, always will be):

  • Making connections
  • Rendering and exercising judgment
  • Synthesis, interpretation, explanation, persuasion
  • Creating and being creative
  • Developing and nurturing relationships
  • Decision making and value assessments
  • Deciding when and when not to take action
  • Problem definition and resolution

With the above distinctions made, let’s take a look at whether it’s plausible for AI to replace decision scientists.  Below are the areas of decision science that require more than machine learning and AI to be successful:

  • Defining the process that generates the data
  • Designing and re-designing an analytics strategy
  • Building data integration and cleansing strategies (i.e., ETL and MDM), and specifically, making decisions about survivorship and business usage of recorded data
  • Integrating data sources and systems
  • Data cleansing, domain and rule definition
  • Optimizing the allocation of resources to customers based on analytic guidance
  • Changing the culture of the organization and how they use the result of data science to improve how the business performs and serves its customers

Despite recent advances, AI is not a new idea. With traditional, and long held concepts at its core, it’s my contention that what sets modern AI and machine learning apart is the dramatic expansion of certain data domains (i.e. speech, images, remote sensing etc.), and perhaps most importantly, the successful adoption by some practitioners with a tightly-focused investment, in a well-defined domain, to address a specific social or business challenge. Most recently, Alpha Go Zero and the OpenAI challenge in the DOTA2 game domain represent notable examples.

Regardless of much media hype, we believe the bottom line is, and always should be, how do new approaches and technologies lead to better action.  Otherwise, it’s all just an exercise in academic debate.

A Campaign Management Case Study

Credit Union takes advantage of member data to improve campaign effectiveness.

Businesses across the country are turning to data to gain a competitive advantage and improve profitability. But their success hinges on the ability to gain insights from that data – and from those insights, the ability to implement profitable change both strategically and technically across their organization.  The case of the credit union described below is the perfect example of data being collected, but not effectively utilized.

Challenge: The marketing division of this credit union had been wholly dependent on a marketing agency to run their campaigns, which as a result tended to be plain vanilla and treated most members as if their relationship with the credit unions were all equivalent. The credit union wanted to take advantage of their member data warehouse in segmenting and targeting campaigns, and in attributing responses to those campaigns in future efforts.

Solution: Corios automated the production of marketing campaigns for our client using their enterprise data warehouse as the primary data source augmented with data tables supplied by external data source providers, such as Experian. We replicated many of their campaigns in the new platform, trained them how to replicate and build new campaigns, how to modernize their campaigns using the new tool, and taught them how to use test-and-learn and analytics in their targeting strategy.

Result:This increased credit union usage by members, strengthen and deepen member access, provided real-time access to data, and increased cross-sell retention.

 

For analytics to be truly powerful, they must do more than simply process large amounts of data using a static set of statistical techniques. At Corios, we believe that powerful analytics should create new insights to be implemented and ultimately shape the decision making process.

Guided by “Competing on Analytics” – Corios’ Research on Analytics Maturity

“Competing on Analytics” was published 10 years ago, and is as relevant to business success today, as it was then.  At Corios, we wanted to build on the Davenport team’s important work by identifying: if a business enterprise finds itself in a particular rung of the analytics maturity ladder, what should it do to climb to the next rung?

Using the team’s analytic maturity tiers, we conducted our own qualitative and quantitative research in the field, based on decades of our own analytics practitioner and management consulting experience, gained through client engagements. We believe this provides a unique perspective, versus traditional approaches like survey research or interviews.

In partnership with The International Institute of Analytics, we developed an analysis of analytics maturity. We scored client cases on a range of measures, both qualitative and quantitative, in order to determine the characteristic qualities of companies at each tier of analytics maturity.

From a universe of over 200 engagements, we selected 60 engagements across 57 unique client organizations around the world (including the US, Canada, Europe and Australia). Once rated, we used statistical methods to assign each client into a cluster. We then analyzed the recommendations we delivered to each of these client engagements, and synthesized the similarities of those recommendations, and have summarized them here.

A: Competitive

  • Increase speed of deployment for models using transaction-level detail;
  • Increase alignment of IT with the business and analytics teams;
  • Increase alignment of analytics with business management, and between business teams;
  • Increase tech savvy of business teams

B: Capable

  • Build a larger analytics team
  • Build larger lists of more sophisticated models;
  • Develop a broader understanding of analytics among leadership, and a stronger alignment between analytics and decision making;
  • Develop the capability for real time scoring deployment

C: Aspiring

  • Drive cultural change to embrace analytics from the executive level down;
  • Repair the fractured relationships across business teams;
  • Move analytics from IT to the business, starting with a centralized team;
  • Develop a stronger alignment across customer strategy, and a more holistic use of analytics across products and strategies

D: Reactive

  • Increase analytics sophistication and capabilities, and grow the size of the analytics team, to maintain scale with business growth and demand for analytics;
  • Create stronger alignment between analytics and business, and among analytics teams throughout the enterprise;
  • Build a central analytics data repository;
  • Increase familiarity of business leadership with analytics capabilities

E: Aware

  • Build a customer analytics orientation from top-down;
  • Create visibility for the contribution of analytics towards revenue-contributing objectives;
  • Build an analytics data platform and company-wide initiatives to increase familiarity with analytics capabilities;
  • Modernize analytics skills among a centralized analytics team;
  • Increase IT agility in support of analytics and IT alignment with the business from the top down

To learn more, download our free book Skate Where the Puck is Headed.