Welcome back to our digital marketing case study example. In the previous part, you had initiated an email donation campaign for your client Helping Hand, an NGO, with three different ads for the same cause to support the needy in Africa. The idea is to identify the ad that generates the maximum donation from the email recipients.
In this part, you will analyze the data from this campaign. You will use A/B testing, a widely used method in digital marketing, to test the effectiveness of different ads. You will also identify shortcomings of A/B testing. In the next parts of the case study example, you will devise a robust method to improve A/B testing using concepts from Bayesian Statistics and reinforcement learning (a method used in the development of artificial intelligence and intelligent systems).
Before we learn the technical subtleties of A/B testing, let’s explore how A/B testing is similar to Olympic events.
A/B Testing – Who Wins the Race?
At the Rio 2016 Olympic Games, the women’s 400meter sprint final witnessed a dramatic finish. Allyson Felix of the USA was the favorite going into the race, with four Olympic gold medals already in her tally. Shaunae Miller from Jamaica was her biggest challenge. At the beginning of the race, Miller took a sizable lead. Felix, being the champion she was, sprinted through the last leg and started inching ahead of Miller. Felix was all set to win another gold. This was when something incredible happened. Shaunae Miller, in her final effort to win the gold, dived across the finishing line. It was a photo finish between Felix and Miller. Initially, to the naked eyes, it seemed that Felix had won. The results from the ultraslow motion, however, showed that Miller beat Felix by less than a tenth of a second to win the gold.
Many playfully asked if Miller should get the gold for sprinting or diving. A statistician, however, will ask a different question: If you repeat the final moments of this race many times (say 100), how many times Shauna Miller will win the race? Miller, to me, didn’t look completely in control of that dive. I will put my money on Allyson Felix to win more than 75 times out of 100 races. This number is a wild guess but the greater odds were in favor of Felix. The final results saw Miller winning the gold. This is an important lesson in statistical thinking that higher odds and final outcome could be different.
A/B testing is very similar to an Olympic race as will see in our digital marketing case study example in the next segment.
Click Through Rate – Case Study Example
In this case study example, you are identifying the best ad/message to generate the maximum donation from an email campaign. Essentially, it is a race between these three advertisements to win the gold.
In an Olympic race, the performance measure is welldefined i.e. fastest to finish. Similarly, we need to define the performance measure for these banners before making them compete in a race. To do so, let’s look at the life cycle and associated metric for an email campaign.
The life cycle of email campaigns has several metrics to measure the performance of the campaign as displayed in the schematic.
Delivery and Bounce Rate
You are an expert in digital marketing and campaigns and are aware of expected standards for these metrics for a campaign like yours. For instance, you expect a high bounce rate of 812% for the emails because these email IDs were collected on a paper form. This means if 1000 emails were sent out then approximately 100 of them will bounce (remember MAILERDAEMON in your email box). This makes the email delivery rate as:
Open Rate
Now, some of the recipients of these delivered emails will open the mail to check the content of the mail. Other mail will either go into the spam folder or get deleted without being opened. You expect this rate to be 815% for your campaign.
Click Rate
Moreover, each of these ads has a specific calltoaction in the form of hyperlinks for the interested donors which says ‘know how you could help’. Some of these opened emails will have a click on the hyperlinks. This initial interest of email recipients will be registered as the click rate. You expect the click rate to be between 1020% for your campaign.
Finally, some of these clicks will generate the actual donation in the form of hard cash. This is the last mile for this campaign.You will notice that a small fraction of 1000 emails sent will generate the actual donation. This is registered as conversion rate and actual donation amount.
Performance Measure for A/B Testing – Case Study
In this case study example, you are interested in learning about the effectiveness of the three ads for donation. In the campaign lifecycle, the effectiveness of these ads is between the open and click stages. Hence, the right performance measured for the ads is click rate. You had kicked off the campaign with 27000 emails. It is barely an hour since the emails were sent and you have some initial results from the campaigns.
Ads  Total Emails Sent  Clicks (1st hour)  Opens (1st hour) 
A  5500  4  58 
B  9000  14  101 
C  12500  23  129 
The first thing that your client notices when she saw these numbers is that you have not distributed the sent email evenly across 3 ads i.e. 9000 per ad. Incidentally, adA has received just 20.4% of email vs 46.3% for adC. She is a bit confused about this distinction but you assure her that you are using the principles of reinforcement learning for sampling the emails to maximize returns. She seems assured for now but would like to learn more about this seemingly odd sampling methodology later. Before she leaves for a meeting, you promise her that you will be more than happy to explain the details when you meet her after a few days.
Now that you are left alone and have some time in hand you analyze the initial results from the campaign. You are aware that the first hour’s click rate is not a good representation of the campaign’s performance. The ideal time to measure an email campaign performance is between 710 days. But you do the analysis anyway to brush up your skills for A/B testing before you get the comprehensive data after few days.
A/B Testing or Hypothesis Testing of the Campaign Results
A click rate is a form of Bernoulli experiments where only two outcomes are possible i.e. clicked (1) and not clicked (0). A/B testing is essentially a hypothesis testing of proportions i.e. the click rates. The null hypothesis or status quo is that all the ads are the same in terms of the performance or click rate (π) which is represented as:
A more interesting scenario will be if one ad outperforms others and wins the gold. After all, a threeway tie is a boring race. We will identify the winner using hypothesis testing or A/B testing. These are certain underlying sample requirements that the experiment needs to satisfy to do a scientific hypothesis testing. We need to have at least 10 instances of both clicks and nonclicks. This requirement can be relaxed to 5 clicks/nonclick with a few caveats but the sample size should never fall below 5. You notice that for adA you have just 4 clicks which are not a sufficient sample size. Hence, you can only compare performance for adB against adC. Let’s first find the click rate for adB:
Similarly, the click rate for adC is:
17.82% is much better than 13.82% or is it? Remember when we hypothetically made Allyson Felix and Shaunae Miller run 100 races. Let’s do the same to adB and adC to test their performance.
R code and Results
The best part is a simple one line of R code could do this job for us.
prop.test(x = c(14, 23), n = c(101, 129), correct = FALSE)
R shows the following result on the console. You could test this code on this Online R compiler if you don’t have R installed on your system.
2sample test for equality of proportions without continuity correction data: c(14, 23) out of c(101, 129) Xsquared = 0.66075, df = 1, pvalue = 0.4163 alternative hypothesis: two.sided 95 percent confidence interval: 0.13404196 0.05468054 sample estimates: prop 1 prop 2 0.1386139 0.1782946
The most important part in this result is the pvalue of 0.4163 ~ 0.42. This, on some level, means adB will beat adC in 42 races out of 100. On the other hand, adC will win just 58 races. That’s no good. Incidentally, statisticians are the fans of pvalue = 0.95 or a competitor beating the other in more than 95 races out of 100. Based on these results, adC is not significantly better than adB.
Final Results of the Campaign and Questions from Your Client
After a week of rolling out the email campaign, you client shares the results of the campaign. These results could be considered as final for the campaign because waiting for more days will not change these numbers much.
Ads  Total Emails Sent  Clicks (5 days)  Opens (5 days) 
A  5500  41  554 
B  9000  98  922 
C  12500  230  1235 
Now, you are on your own to analyze this data.You may find this Online R compiler useful. You need to answer these questions posed by your client.

Report your answers to your client in the comments section at the bottom of this post. Share your thoughts and questions as well.
Signoff Note
If you remember, you had prior knowledge of the results for your campaign results i.e. 1020% 0f click rate base on your experience. You, however, did not use that prior knowledge in your analysis. A/B testing in its classical form (i.e. Fisherian statistics) has this problem. The whole purpose of knowledge is to grow incrementally. Otherwise, each analysis is performed in isolation without linking it to the prior knowledge. In the next article, we will explore how Bayesian statistics could be used to improve A/B testing by usage of the prior knowledge.
Hi Roopam,
The results for B and C as follows:
data: c(98, 230) out of c(922, 1235)
Xsquared = 26.166, df = 1, pvalue = 3.133e07
alternative hypothesis: two.sided
95 percent confidence interval:
0.10939210 0.05049619
sample estimates:
prop 1 prop 2
0.1062907 0.1862348
from result pvalue = 3.133e07, which is very very low value to analysis.
it means out of 10^7 attempts B beat C 3 times only which very poor performance.
can you please explain about Xsquared = 26.166, df = 1, pvalue = 3.133e07 ?
what is the math behind above said parameters and how interpret them ?
Thanks in advance…….
The mathematical logic behind this is that 98 clicks out of 230 trials are Bernoulli experiments. This is similar to coin tosses. Bernoulli trials make a binomial distribution which can be approximated as a bell curve or normal distribution curve.
Essentially, when you compare 98 successes (clicks) out of 230 opens/trials with 922 success of 1235 trials you are comparing two bell curves. The mean for the first bell curve is 98/230 and its standard deviation is sqrt(98/230*(198/230)). You could find the mean and sd for the second curve in the same way.
Now to compare these bell curve, you will randomly pick values from both these curves and compare them with each other. When you say pvalue = 3.133e07, this means that only 3 values from the first curve will be greater than the values from the second curve out of ten million picks. This is the same conclusion you made. This is the only thing one needs to understand rest is all technical jargon.