logo
  • Home
  • Blog-Navigation
  • Art
  • About
  • Contact

Bayesian Inference – Made Simple

· Roopam Upadhyay 9 Comments

In this article we are going to discuss Bayesian inference and statistics. In order to gain understanding of Bayesian inference we will use the experiment of coin tossing. Hence it is only appropriate to have a quick discussion about the power of…

…Flip of the Coin

To begin with, let me pay my tribute to the master of randomness and the most flippant of them all – the act of tossing a coin.

O’ you epitome for fickleness
     O’ you undisputed lord of chance,
May no game ever begin without
    Your approval and mystical dance.

The Mystical Dance - by Roopam

The Mystical Dance – by Roopam

When I was in high school, I always use to wonder why tossing coins were such an integral part of probability theory. Almost every basic, intermediate or advanced book on probability has more than a few references to coin tosses. The reason, I figured out later, is that tossing a coin is one of the easiest and fastest experiments with random binary outcomes. The following are some of the practical events with random binary outcomes where one is trying to identify –

– Fraudulent transactions
– Customers for cross/up selling
– Malignant cancer cells
– Credit risk of a customer
– Spam emails
– Employees likely for attrition

All these events are in principal similar to tossing coins. The major difference for these events to tosses is that it will take several months to register the outcomes of these events – unlike heads and tails. In the subsequent segments, we’ll discuss Bayesian statistics through coin tosses.

Frequentist and Bayesian Inference

If one has to point out the most controversial issue in modern statistics, it has to be the debate on Frequentist and Bayesian statistics. Frequentist or Fisherian statistics (names after R. A. Fisher) is something that is often employed in most scientific investigation, though there are enough doubts raised about Frequentist logic (read The Signal and The Noise by Nate Silver). Irrespective of the controversies, it is essential for a pragmatic analyst to understand the difference and application of both these approaches.

Frequentist approach is based on setting a testable hypothesis and collecting samples to accept or reject this hypothesis. The Bayesian inference on the other hand modifies its output with each packet of new information. I have discussed Bayesian inference in a previous article about the O. J. Simpson case; you may want to read that article. If you could recall setting a prior probability is one of the key aspects of Bayesian inference. Defining prior probability also makes the analyst think carefully about the context for the problem as this requires a decent understanding of the system. We will get a first-hand experience of this in the example to follow.

Bayesian Inference

Let’s assume that there is a company that makes biased coins i.e. they adjust the weight of the coins in such a way that the one side of the coin is more likely than the other while tossing. For a time being assume that they make just two types of coins. ‘Coin 1’ is 30% likely to give heads (rather than 50% for an unbiased coin) and ‘Coin 2’ is 70% likely to give heads. Now, you have got a coin from this factory and want to know if it is ‘Coin 1’ or ‘Coin 2’. At this point you have inquired about the production system of the company that produces both types of coin in the same quantity. This will help you define you prior probability for the problem that is your coin is equally likely to be either ‘Coin 1’ or ‘Coin 2’ or 50-50 chance.

After assigning prior probability, you have tossed the coin 3 times and got heads in all three trials. The Frequentist approach will ask you to take more samples since one cannot accept or refuse the null hypothesis with this sample size at 95% confidence interval. However, as we will see with Bayesian approach each packet of information or trial will modify the prior probability or your belief for the coin to be ‘Coin 2’.

At this point we know the original priors for the coins i.e.

P(Coin1)=P(Coin2)=0.5\ (50\%) 

Additionally we also know the conditional probability i.e. chances of heads for Coin 1 and Coin 2

P(Heads|Coin1)= 30\% 
P(Heads|Coin2)= 70\% 

Now we have performed our first experiment or trial and got heads. This is a new information for us and this will help calculate the posterior probability of coins when heads has happened (recall for our previous article read that article)

P(Coin 1|Heads)=\frac{P(Heads|Coin 1)\times P(Coin1)}{P(Heads|Coin 1)\times P(Coin1)+P(Heads|Coin 2)\times P(Coin2)} 

If we insert the values to the above formula the chances for Coin1 have gone down

P(Coin 1|Heads)=\frac{(0.3\times 0.5)}{(0.3\times 0.5)+(0.7\times 0.5)}=0.3 

Similarly chances for Coin 2 have gone up

P(Coin 2|Heads)=\frac{(0.7\times 0.5)}{(0.3\times 0.5)+(0.7\times 0.5)}=0.7 

This same experiment is shown in the figure below:

prior

As mentioned in the above figure the posterior probabilities of the first toss i.e. P(Coin 1|Heads) and P(Coin 2|Heads) will become the prior probabilities for the subsequent experiment. In the next two experiments we have further got 2 more heads this will further modify the priors for the fourth experiment as shown below. Each packet of information is improving your belief about the coin as shown below.

 

Slide1

By the end of the 3rd experiment you are almost 93% sure that you have a coin that resembles Coin 2. The above example is a special case since we have considered just two coins. However, for most practical problems you will have factories that produce coins of all different biases (remember toss of coin is a synonym for the more practical problems we have discussed above). In such cases the prior probabilities will be a continuous distribution (a case for triangle distribution as shown below). Notice with the results of each experiment how bulge is shifting toward coins that are more likely to produce heads. Keep in mind the orange bar in the figure below is for the unbiased coin i.e. 50/50 chances for heads and tails.

 

Slide2

Sign-off Note

As I am learning more about Bayesian Inference, I must say as an analyst my affinity or bias is leaning towards Bayesian over Frequentist. However, in the same breath I must also point out that setting Bayesian priors and calculation for Bayesian statistics are effort intensive. Additionally, in absence of a proper prior selection Bayesian inference can produce similar or worse results compared to Frequentist approaches. As mentioned Bayesian approaches require better understanding of context and the system of interest. See you soon.

  • Share
  • Twitter
  • Facebook
  • Email
  • LinkedIn

Related

Posted in Analytics Graffiti, Bayesian Inference and Statistics | Tags: Bayesian inference, Business Analytics, Customer Segmentation, Predictive Analytics, Roopam Upadhyay |
« Telecom Case (Part 4) – Customer Segmentation and Application
Bayes’ Theorem – Monty Hall Problem »

9 thoughts on “Bayesian Inference – Made Simple”

  1. Sesha Modukuru says:
    March 9, 2014 at 6:37 pm

    Roopam, you continue to amaze me with the simplicity with which you explain complex theories. Thanks

    Reply
    • Roopam Upadhyay says:
      March 10, 2014 at 8:42 am

      Thanks Sesha

      Reply
  2. Vishal Goyal says:
    March 10, 2014 at 12:51 pm

    Hi Roopam, this article was a wonderful read; I could follow everything perfectly till the part you discussed Bayesian inference for continuous distributions. It would be nice if you could explain how you modified the prior probabilities for a continuous plot. Thanks for posting these articles.

    Reply
    • Roopam Upadhyay says:
      March 11, 2014 at 7:40 am

      Thanks Vishal, I am glad you enjoyed the article. Let me try to explain the process of modification of continuous priors in my next post. The process is not too different from the discrete probability we discussed in this article.

      Reply
  3. Chetan Ahuja says:
    March 19, 2014 at 1:29 pm

    Enjoying your blog, appreciate your effort. A budding Data Scientist.

    Reply
  4. Debanjan says:
    March 23, 2014 at 3:22 am

    Superbly simplified explanation of Bayesian inference! keep it coming!

    Reply
  5. Akash Tripathi says:
    March 15, 2015 at 1:48 am

    Thanks Roopam for this wonderful article

    Reply
  6. Xuewen Liu says:
    November 17, 2017 at 10:44 am

    Wonderful post! Can you explain further “The Frequentist approach will ask you to take more samples since one cannot accept or refuse the null hypothesis with this sample size at 95% confidence interval. “. and give some formula to calculate the result.

    Reply
    • Roopam Upadhyay says:
      December 12, 2017 at 9:57 am

      I think you will find your answers in this case study example: http://ucanalytics.com/blogs/category/digital-marketing-case-study/. Read part 3 and 4 in particular.

      Reply

Leave a comment Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Subscribe to Blog

Provide your email address to receive notifications of new posts

Must Read

Career in Data Science - Interview Preparation - Best Practices

Free Books - Machine Learning - Data Science - Artificial Intelligence

Case-Studies

- Marketing Campaign Management - Revenue Estimation & Optimization

Customer Segmentation - Cluster Analysis - Segment wise Business Strategy

- Risk Management - Credit Scorecards

- Sales Forecasting - Time Series Models

Credit

I must thank my wife, Swati Patankar, for being the editor of this blog.

Pages

  • Blog-Navigation
  • Art
  • About
  • Contact
© Roopam Upadhyay
  • Blog-Navigation
  • Art
  • About
  • Contact
 

Loading Comments...