Welcome back to our digital marketing case study example. In the previous part, you had initiated an email donation campaign for your client Helping Hand, an NGO, with three different ads for the same cause to support the needy in Africa. The idea is to identify the ad that generates the maximum donation from the email recipients.
In this part, you will analyze the data from this campaign. You will use A/B testing, a widely used method in digital marketing, to test the effectiveness of different ads. You will also identify shortcomings of A/B testing. In the next parts of the case study example, you will devise a robust method to improve A/B testing using concepts from Bayesian Statistics and reinforcement learning (a method used in the development of artificial intelligence and intelligent systems).
Before we learn the technical subtleties of A/B testing, let’s explore how A/B testing is similar to Olympic events.
A/B Testing – Who Wins the Race?
At the Rio 2016 Olympic Games, the women’s 400-meter sprint final witnessed a dramatic finish. Allyson Felix of the USA was the favorite going into the race, with four Olympic gold medals already in her tally. Shaunae Miller from Jamaica was her biggest challenge. At the beginning of the race, Miller took a sizable lead. Felix, being the champion she was, sprinted through the last leg and started inching ahead of Miller. Felix was all set to win another gold. This was when something incredible happened. Shaunae Miller, in her final effort to win the gold, dived across the finishing line. It was a photo finish between Felix and Miller. Initially, to the naked eyes, it seemed that Felix had won. The results from the ultra-slow motion, however, showed that Miller beat Felix by less than a tenth of a second to win the gold.
Many playfully asked if Miller should get the gold for sprinting or diving. A statistician, however, will ask a different question: If you repeat the final moments of this race many times (say 100), how many times Shauna Miller will win the race? Miller, to me, didn’t look completely in control of that dive. I will put my money on Allyson Felix to win more than 75 times out of 100 races. This number is a wild guess but the greater odds were in favor of Felix. The final results saw Miller winning the gold. This is an important lesson in statistical thinking that higher odds and final outcome could be different.
A/B testing is very similar to an Olympic race as will see in our digital marketing case study example in the next segment.
Click Through Rate – Case Study Example
In this case study example, you are identifying the best ad/message to generate the maximum donation from an email campaign. Essentially, it is a race between these three advertisements to win the gold.
In an Olympic race, the performance measure is well-defined i.e. fastest to finish. Similarly, we need to define the performance measure for these banners before making them compete in a race. To do so, let’s look at the life cycle and associated metric for an email campaign.
The life cycle of email campaigns has several metrics to measure the performance of the campaign as displayed in the schematic.
Delivery and Bounce Rate
You are an expert in digital marketing and campaigns and are aware of expected standards for these metrics for a campaign like yours. For instance, you expect a high bounce rate of 8-12% for the emails because these email IDs were collected on a paper form. This means if 1000 emails were sent out then approximately 100 of them will bounce (remember MAILER-DAEMON in your email box). This makes the email delivery rate as:
Now, some of the recipients of these delivered emails will open the mail to check the content of the mail. Other mail will either go into the spam folder or get deleted without being opened. You expect this rate to be 8-15% for your campaign.
Moreover, each of these ads has a specific call-to-action in the form of hyperlinks for the interested donors which says ‘know how you could help’. Some of these opened emails will have a click on the hyperlinks. This initial interest of email recipients will be registered as the click rate. You expect the click rate to be between 10-20% for your campaign.
Finally, some of these clicks will generate the actual donation in the form of hard cash. This is the last mile for this campaign.You will notice that a small fraction of 1000 emails sent will generate the actual donation. This is registered as conversion rate and actual donation amount.
Performance Measure for A/B Testing – Case Study
In this case study example, you are interested in learning about the effectiveness of the three ads for donation. In the campaign lifecycle, the effectiveness of these ads is between the open and click stages. Hence, the right performance measured for the ads is click rate. You had kicked off the campaign with 27000 emails. It is barely an hour since the emails were sent and you have some initial results from the campaigns.
|Ads||Total Emails Sent||Clicks (1st hour)||Opens (1st hour)|
The first thing that your client notices when she saw these numbers is that you have not distributed the sent email evenly across 3 ads i.e. 9000 per ad. Incidentally, ad-A has received just 20.4% of email vs 46.3% for ad-C. She is a bit confused about this distinction but you assure her that you are using the principles of reinforcement learning for sampling the emails to maximize returns. She seems assured for now but would like to learn more about this seemingly odd sampling methodology later. Before she leaves for a meeting, you promise her that you will be more than happy to explain the details when you meet her after a few days.
Now that you are left alone and have some time in hand you analyze the initial results from the campaign. You are aware that the first hour’s click rate is not a good representation of the campaign’s performance. The ideal time to measure an email campaign performance is between 7-10 days. But you do the analysis anyway to brush up your skills for A/B testing before you get the comprehensive data after few days.
A/B Testing or Hypothesis Testing of the Campaign Results
A click rate is a form of Bernoulli experiments where only two outcomes are possible i.e. clicked (1) and not clicked (0). A/B testing is essentially a hypothesis testing of proportions i.e. the click rates. The null hypothesis or status quo is that all the ads are the same in terms of the performance or click rate (π) which is represented as:
A more interesting scenario will be if one ad outperforms others and wins the gold. After all, a three-way tie is a boring race. We will identify the winner using hypothesis testing or A/B testing. These are certain underlying sample requirements that the experiment needs to satisfy to do a scientific hypothesis testing. We need to have at least 10 instances of both clicks and non-clicks. This requirement can be relaxed to 5 clicks/non-click with a few caveats but the sample size should never fall below 5. You notice that for ad-A you have just 4 clicks which are not a sufficient sample size. Hence, you can only compare performance for ad-B against ad-C. Let’s first find the click rate for ad-B:
Similarly, the click rate for ad-C is:
17.82% is much better than 13.82% or is it? Remember when we hypothetically made Allyson Felix and Shaunae Miller run 100 races. Let’s do the same to ad-B and ad-C to test their performance.
R code and Results
The best part is a simple one line of R code could do this job for us.
prop.test(x = c(14, 23), n = c(101, 129), correct = FALSE)
R shows the following result on the console. You could test this code on this Online R compiler if you don’t have R installed on your system.
2-sample test for equality of proportions without continuity correction data: c(14, 23) out of c(101, 129) X-squared = 0.66075, df = 1, p-value = 0.4163 alternative hypothesis: two.sided 95 percent confidence interval: -0.13404196 0.05468054 sample estimates: prop 1 prop 2 0.1386139 0.1782946
The most important part in this result is the p-value of 0.4163 ~ 0.42. This, on some level, means ad-B will beat ad-C in 42 races out of 100. On the other hand, ad-C will win just 58 races. That’s no good. Incidentally, statisticians are the fans of p-value = 0.95 or a competitor beating the other in more than 95 races out of 100. Based on these results, ad-C is not significantly better than ad-B.
Final Results of the Campaign and Questions from Your Client
After a week of rolling out the email campaign, you client shares the results of the campaign. These results could be considered as final for the campaign because waiting for more days will not change these numbers much.
|Ads||Total Emails Sent||Clicks (5 days)||Opens (5 days)|
Now, you are on your own to analyze this data.You may find this Online R compiler useful. You need to answer these questions posed by your client.
Report your answers to your client in the comments section at the bottom of this post. Share your thoughts and questions as well.
If you remember, you had prior knowledge of the results for your campaign results i.e. 10-20% 0f click rate base on your experience. You, however, did not use that prior knowledge in your analysis. A/B testing in its classical form (i.e. Fisherian statistics) has this problem. The whole purpose of knowledge is to grow incrementally. Otherwise, each analysis is performed in isolation without linking it to the prior knowledge. In the next article, we will explore how Bayesian statistics could be used to improve A/B testing by usage of the prior knowledge.