This is a continuation of the case study example of marketing analytics we have been discussing for the last few articles. You can find the previous parts at the following links ( Part 1, Part 2, and Part 3). In the last part, we discussed exploratory data analysis (EDA: Part 3). In this article we will talk about association analysis, a helpful technique to mine interesting patterns in customers’ transaction data. Association analysis can be used as a handy tool for extended exploratory data analysis. By the way, association analysis is also the core of market basket analysis or sequence analysis. Later in the article, we will use association analysis in our case study example to design effective offer catalogs for campaigns and also online store design (website).
I must have been 9 or 10 years old when in our school we had our first craft lecture. Craft lectures are called SUPW in India, it’s an abbreviation for ‘Socially Useful Productive Work’. As a part of the first lecture, each student was provided with an A4 sized color paper and a pair of scissors. In the first lecture excited kids with no direction discovered that they could cut a sheet in a virtually infinite number of ways. It was neither socially useful nor productive work, and created a lot of wasted paper. A more apt long form of SUPW in this case is ‘Some Useful Paper Wasted’. Later with a more directed effort we discovered that there are so many cool shapes hidden in a piece of paper as long as scissors are used wisely.
This is precisely the kind of experience many analysts have when they come across customers’ transaction data in companies. There is wealth of information about customer behavior hidden in this data but it is hard to figure out where to start. Transaction data can be sliced, diced and grouped in infinitely many ways similar to a piece of paper dissected with scissors. The key in both these above cases is direction.
Hollywood Image of Data Analysis
Let me describe a typical Hollywood visual for data analysis, a man standing in front of a giant screen with data (sequence of numbers) floating all over the screen. This man will detect patterns in this data on the fly. This is a powerful image but completely untrue. The above technique of stare at data and hope to find patterns is guaranteed to generate all noise and very little signal. Even the great code breakers like John Nash and Alan Turing will fail if they try to find patterns in data using this Hollywood technique.
The point I am trying to drive at here is that data analysis is a highly planned activity. As an analyst never touch your data before you have a proper plan of action (hypotheses etc.) in place. Having said this there are always going to be times as an analyst, when you have to enter uncharted territories of data to find patterns. In these cases, I will recommend you rely on machine learning algorithms or create your own modified algorithms specific to your requirements. In my opinion, machines are any day better than us humans at this task. Association analysis powered by the Apriori algorithm is one such technique to mine transaction data. Let’s explore association analysis in the next part.
Association analysis, as you will discover soon, is primarily frequency analysis performed on a large dataset. Since datasets for most practical problems are large you need clever algorithms like Apriori to manage association analysis.Let’s consider a much smaller transaction dataset to learn about association analysis. Here, each row or transaction number represents market baskets of customers. For the subsequent products columns, 1 represents ‘bought the product in that transaction’, whereas, 0 stands for ‘did not buy’.
There are a few association analysis metrics (i.e. support, confidence, and lift) that are really helpful in deciphering information hidden in this kind of dataset. Let us explore these metrics and understand their usage. Support for purchase of shirts and ties together in association analysis is defined as:
For our data there are 3 transactions with both shirts and ties (shirts∩ties) out of total 5 transactions.
60% is a fairly high value for support and you will rarely find such high values for support in real world examples. For real world problems with several product groups, support of 1% or at times even lower depending upon the nature of your problem is also useful.
Confidence for association is calculated using the following formula:
In our dataset, there are 3 transaction for both shirts and ties together out of 4 transactions for shirts. The calculation for confidence for our dataset is:
Again you will rarely find such high value of confidence for most real world problems unless there are appealing combo offers on two products. A good value of confidence is again problem specific.
A third useful metric for association analysis is lift; it is defined as:
Expected confidence in the above formula is presence of ties in the overall dataset i.e. there are 4 instances of ties purchase out of 5.
The value for lift, 125%, shows that purchases of the ties improve when the customers buy shirts. The question you are asking here is that if the customer buys a shirt, does his chance of buying ties go up i.e. value of lift above 100%. Let us use our knowledge about association analysis for the case study example we have been working on.
Retail Case Study Example – Association Analysis
DresSMart Inc., where you are the Chief Analytics Officer & Business Strategy Head, is an online retail store for clothes and apparel. They showcase different products, brands, and styles. You know association analysis works best when performed separately on different customer segments (read about customer segmentation). However, you have decided to do a quick association analysis on the data available in your company.
With your data for formal shirts and ties we explored in the above example, you got support of 0.2% with confidence of 12% and lift of 509%. This implies that though there are fewer percentage records of transactions with both ties and shirts, once the customers buys formal shirts his chances of buying a tie goes up five fold.
DresSMart provides the option to it’s customers to return the undamaged product back within 30 days with full refund. You did a further investigation of customers who are buying ties along with shirts and found that product return rates of the ties for these transactions are also 3 times more than the other return rates. This is an indicator that customers are struggling to choose matching ties while placing the orders online along with shirts. There is a need to improve this process on the company’s website. The idea is to reduce product return rate while exploiting the full opportunity for cross selling ties with shirts.
You have found some good clues to improve the profitability of your company through exploratory data analysis tools. Now you want to prepare and address the original objectives (Part 2) to improve profitability for campaign efforts. You will delve into serious modeling for this task next time around.
Hope you enjoy being Edward Scissorhands with your data! See you soon with the next part of this case study example where we will explore more about decision tree algorithms.