For the last couple of weeks we have been working on a marketing analytics case study example (read Part 1 and Part 2). In the last part (Part 2) we defined a couple of advanced analytics objectives based on the business problem at an online retail company called DresSmart Inc. In this part, we will perform some exploratory data analysis as a part of the same case study example. But before that let’s explore the power of exploratory data analysis (EDA) to reveal hidden facts about the greatest game on the planet – soccer or football.
Soccer – Exploratory Data Analysis
Soccer is undoubtedly the most popular game on the planet with over 200 nations having their official soccer teams. No other game has such a universal appeal with millions of hardcore followers. Every detail of soccer is analyzed by the players, the coaches and the support staff. Despite this, a careful exploratory data analysis of the game could unravel match-winning secrets about the greatest game, as you will see in the next two example case studies.
Penalty Kicks
Let’s relive the first knockout (pre-quarterfinal) match of the Soccer World Cup 2014 between Brazil and Chile. The scores were level at 1-1 at the end of allotted 90 minutes. Even the extra half an hour could not conclude the match with the scoreboard still reading 1-1. This led the match towards penalty shoot-outs to break the tie. After the Brazilian player, Neymar, scored the goal in the penultimate penalty kick, Brazil were 3-2 ahead in the penalty shootouts. Chile still has a penalty kick left from Gonzalo Jara and the opportunity to extend the tie further – but if he misses Chile’s campaign will be over in the competition. What should Gonzalo Jara do to extend the tie?
On average, at this level around 75% penalty kicks convert to goals. The odds, by this definition, are highly in favor of Gonzalo Jara. Where should he kick the ball to improve his odds further? All the fans, coaches, and players will say kick the ball in either corner, away from the goalkeeper who is standing in the center of the goal. They will also advise never to shoot the ball at the dead center towards the goalkeeper. A group of researchers asked the same question and did the exploratory data analysis of penalty kicks at the elite level of soccer. Goalkeepers usually go by their instincts when the ball is kicked at them with undecipherable pace. They either jump towards their left (57% of times) or right (41% of times). This leaves them at the center just 2% of times to stop the ball hit right towards them. Hence, a kick hit dead towards the center of the goal has significantly higher chances of conversion to goal then kicks on either corner at the same height.
Back to Gonzalo Jara, he hits the ball towards his right, in the direction of the diving goalkeeper as shown in the picture above. He misses the shot, the ball hits the goal post and ricochets away from the goal. As a result, Chile got knocked out of the world cup and Brazil advanced to the next stage. In Gonzalo Jara’s defense, the conversion rate for crucial penalty kicks like this one (to avoid elimination) drops to 44%. Yes, pressure is another beast to which even the best succumb.
Corner Kicks
In another case, a few years ago Manchester City’s soccer team was struggling with corner kicks and hence decided to do some exploratory data analysis to differentiate effective corner kicks from ineffective. The team of analysts analyzed hundreds of videos of corner kicks from the premier league. After their analysis, they found that in-swinging kicks towards the goal were far more effective and dangerous than the out-swinging kicks. They took their findings to Roberto Mancini, the coach of Manchester City team at that time. Mancini, who has played and followed the game since his childhood, rejected the findings outrightly. He recalled all those memorable and picture perfects goals by great headers of out-swingers. On the other hand, clumsy goals of in-swingers hardly created a lasting impression on the spectators’ mind. Mancini, it turned out, was wrong. All that looks great and memorable is not always optimal. This is a great case for how simple but sincere exploratory data analysis can challenge the deeply ingrained beliefs developed over centuries (yes, soccer is a really old game).
Exploratory Data Analysis – Retail Case Study Example
Back to our case study example (read Part 1 and Part 2), in which you are the chief analytics officer & business strategy head at an online shopping store called DresSMart Inc. You are helping out the CMO of the company to enhance the company’s campaigns’ results. For the last few days, you are playing around with data as a part of exploratory data analysis. The following is one of the several interesting results and patterns you have noticed in the data. When you analyzed the distribution of customers across a number of product categories (men’s shirt, casual trousers, formal skirts etc.) purchased by each customer you found the following pattern.
The above distribution looks more or less as expected. However, there is an interesting peak for customers purchasing more than 50 product-categories. Who are these customers? Why are they buying so many product categories for their usage? You further analyzed this small set of customers and found that they are growing at a faster rate than the other set of customers. Since the inception of the company 7 years ago, the percentage of customers purchasing 50+ product categories in a year has exponentially gone up (currently at 2.1%). This set of customers also contributes to about 23% of all the sales for DresSMart Inc. The following graphs are part of your above analysis.
So, what is going on here? You further analyzed the patterns and size(s) of clothes these customers are buying and noticed they are buying the same style in different sizes. Aha! Now you know them, these are small neighborhood retailers using DresSMart Inc as a wholesaler. The following is what you concluded from the above analysis
- There is no point sending these retailers the same retail product catalog and campaign as to retail customers
- There is an opportunity to strengthen business ties with these mom-&-pop retailers and in turn, improve profitability of your company through a separate business program
Additionally, your further analysis revealed that order fulfillment or delivery patterns (delivery quantity / chargers etc.) for these retailers are similar to other customers. Your company is incurring additional cost for these customers in delivery. You could plan the overall supply chain much better keeping these small retailers in the equation. This exploratory data analysis has given you ideas for more low hanging fruits to improve company’s profitability.
Sign-off Note
Exploratory data analysis is a powerful tool. A diligent EDA is an absolute must to put your advanced business analytics in the right direction. EDA provides a great opportunity to test your simple business hypotheses and hunches before jumping into a rigorous model building. Coming back to soccer, we are approaching the final stages of the World Cup. Enjoy the last few games and may the best team lift the prized trophy.
Roopam,
Excellent way to kick-start the core of the case study. Having spent a fair bit of time in Marketing Analytics (not core modeling, but a lot of EDA and A/B scenarios), I kind of have a hunch where this is going to go – excellent work.
But, just out of curiosity, where do you pick up these case studies from? The reason I ask is because, the data though very interesting, is very case specific and may not apply to situations most of us may encounter in real life. As you said, EDA is the key to analysis before jumping right in, and sometimes it’s very painful and tedious – because there are no obvious trends or insights. I have sometimes spent hours slicing and dicing data before I could really form a hypothesis and test it (which by the way was less painful). Any tips / tricks to your readers like me, who could really save some time on EDA ?
P.S. I have to admit, all of your case studies do seem real and may well be so, but I would be wary to admit if any of us could directly find same or similar trends in our data (that would seem too good to be true ;-))
Thanks Kisalay for your kind words! All these cases in some form or other come from the work I have done at various stages of my career. Of course, I take a lot of creative liberty to completely modify information, trends, storylines, scenarios, and conclusions to protect confidentiality. Additionally, I also try to make the cases easy to understand for the readers. However, for most of these cases the general principles of analysis and logical flow is preserved to a greater extent.
I agree EDA is a tedious exercise but it also makes one feel like a detective 🙂 . Let me share my strategy for EDA, I never touch data before having a plan of action. Like a detective investigation you might destroy evidence if you go in without a plan. I usually prefer to have a mental map of my analysis and logical flow before I start slicing and dicing data. It makes me feel much more in control. I also prefer to have a reasonably defined hypothesis based on a business hunch before analysis. Also when you get completely stuck, take a long break away from your computer – fresh air usually helps.
In case, you have to mine a completely unknown data use machine learning algorithms like decision trees, apriori etc. to slice and dice your data. At times you may have to create your own modified algorithms specific to your requirements. Machines, I am completely sure, are any day better than us humans at this task.
Hello ,
I gone through your blogs with a keen intrest to develop Data Analytics skill from scratch.
As you said in one of your article is ” The best way to develop analytics skill is to have a project in your existing job itself “.
I want to have project in my existing job, i am working for a Furniture Manufacturing company in Sales department . This company manufacture house hold furniture and office furniture at massive scale in central India.
I am an Industrial Engineering graduate passout 2015, my 10th score in maths 147/150 .
I want to have career in data science, please guide me the learning path.
Thankyou
Kapil, to create an analytics opportunity in your company I suggest you answer these questions:
– Is there an analytics team in your company? If yes, what kind of business questions this team usually work on?
– If the answer to the above question is no, are there IT systems (ERP, MIS etc.) available in your company? What kind of data fields could be retrieved from these systems? Talk to your IT team to learn more about it.
– What are some of the quick business questions you could answer using the above data? Focus on important questions but simple analysis to begin with.
Since you are in sales with experience in industrial engineering, I suggest you build your analytical skills on top of your core skills. Supply chain analytics is a major area of growth with lots of opportunities. If you could deliver a few simple yet successful projects in your company it will make your CV really powerful. All the best.
Hi Sir,
Till now there is no analytics team in company, but needs one who can do analytics, i want to fill this gap.
According to your guidance, i discussed with my IT team about the systems available and the kind of data field can get retrieve so that we can answer some basic questions, but to think strategically we are not trained enough. Here i want do a simple analysis project.
To have more clarity, I also prepared three columns in excel with title IT Systems, Data fields and Quick business questions .
I will be very fortunate if you can guide me in the direction to improve my analytical skill and have a simple data analysis project at the same time .
Thank you
HI,
can you share data for this case study, So I will get practical exposure to this problem.
Your blogs are very intuitive and easy to understand.
Thanks
Hi,
I got some rough data with 20 columns & 5000 No. row data. There is no exact details of data for what this data is & no nomenclature for the data.
with this open ended problem wherein no problem has been defined.
So, can we do data paralytics for such data….?
Nice post regarding Data Analysis.