logo
  • Home
  • Blog-Navigation
  • Art
  • About
  • Contact

Decision Tree – Entropy – Retail Case Study Example (Part 6)

· Roopam Upadhyay 14 Comments
Entropy - by Roopam

Entropy : from order to disorder – by Roopam


This article is a continuation of the retail case study example we have been working on for the last few weeks. You can find the previous parts of the case study example at the following links:

Part 1: Introduction
Part 2: Problem Definition
Part 3: EDA
Part 4: Association Analysis
Part 5: Decision Tree (CART)

If you recall from the previous article, the CART algorithm produces decision trees with just binary child nodes. In this article, we will learn another algorithm to produce multiple child nodes decision trees. There are several ways to achieve this objective such as CHAID (CHi-squared Automatic Interaction Detector).  Here, we will learn about the c4.5 algorithm to produce multiple child nodes decision trees. Since it uses a concept very close to my heart:

Entropy

In high school, we were taught the first law of thermodynamics about conservation of energy. It says:

Energy can neither be created nor destroyed, or in other words the total energy for the Universe is constant.

The first reaction from most students after learning this fact was : why bother saving electricity, and fuel? If the total energy of the universe is constant and conserved then we have an infinite amount of energy for usage that will never be destroyed.

The second law of thermodynamics, however, demolishes this comfort about wasting power. Entropy is at the root of the second law of thermodynamics. Entropy is  the measure of disorder or randomness in the Universe. The general direction of the Universe is from order to disorder or towards higher randomness. The second law states that:

Total entropy or overall disorder / randomness of the Universe is always increasing.

OK, let’s take an example to understand this better. When you use fuel to run your car, a perfectly ordered petrol (compact energy) is converted/dissipated to disordered forms of energy like heat, sound, vibrations etc. The work is generated in the process to run the engine of the car. The more disordered or random the energy is the harder/impossible it is to extract purposeful work out of it. So I guess, we care about work and not energy. In other words, higher the entropy or randomness of a system the harder it is to convert it to meaningful work. Physicists define the entropy of a system with the following formula:

Entropy=S=-\sum p{_{i}}\times log_{2}( p_{i}) 

Entropy is also at the heart of information theory. It took the ingenuity of Claude Shannon, the father of information theory, to see the links between thermodynamics and information. He proposed the following definition of entropy to measure randomness within a given message.

Entropy=H=-\sum p{_{i}}\times log_{2}( p_{i}) 

For instance entropy (randomness) of a fair coin, with the equal chance of heads & tails, is 1 bit (as calculated below). Notice, the unit of entropy in information theory is a bit as coined by Claude Shannon. This same unit is used as the fundamental unit of computer storage.

Entropy=H=-(50\%\times log_{2}(50\%)+(1-50\%)\times log_{2}(1-50\%))=1bit 

We will use the same formula for entropy to create decision tree and decipher information within the data.

Retail Case Study Example – Decision Tree (Entropy : C4.5 Algorithm)

Back to our retail case study Example, where you are the Chief Analytics Officer & Business Strategy Head at an online shopping store called DresSMart Inc. that specializes in apparel and clothing. In this case, your effort is to improve a future campaign’s performance. To meet this objective, you are analyzing data from an earlier campaign where direct mailing product catalogs were sent to one hundred thousand customers from the complete customer base of over a couple of million customers.  The overall response rate for this campaign was 4.2%.

You have divided the total hundred thousand solicited customers into three categories based on their past 3 months activities before the campaign. This is the same table we have used in the previous article to create the decision tree using the CART algorithm.

Activity in the Last Quarter Number of Solicited Customers Campaign Results Success Rate
Responded (r)  Not Responded (nr)
low 40000 720 39280 1.8%
medium 30000 1380 28620 4.6%
high 30000 2100 27900 7.0%

 

The following is the binary node tree we have generated through the CART in the previous article.

Decision Tree The CART

Decision Tree The CART

Let us see if we can produce a better tree using entropy or the c4.5 algorithm. Since the c4.5 algorithm is capable of producing multiple node decision trees, hence we will have one additional possibility of a tree (with 3 nodes – low; medium; high). This is in addition to the binary trees explored in the previous article.

The way the c4.5 works is it compares the entropy of all the possible trees with the original data (baseline data). Then it chooses the tree with maximum information gain i.e. difference of entropies:

\textup{Information Gain}=Entropy_{\mathbf{{baseline}}} -Entropy_{\mathbf{{tree}}} 

Hence we need to first calculate the baseline entropy of data with 4.2% conversion (4200 conversions out of 100,000 solicited customers). Notice in the second term 95.8% (=100%-4.2%) is the percentage of non-converted customers.

Entropy_{\mathbf{{baseline}}}=-(4.2\%\times log_{2}(4.2\%)+(95.8\%)\times log_{2}(95.8\%))=25.139\% 

This is the same value we have calculated at the bottom-most row of the following table for total entropy.

Child Nodes Number of customers Number of customers responded % responded Individual Node entropy Total Entropy  Information Gain
low 40000 720 1.8% 0.130059 24.255% 0.88%
medium 30000 1380 4.6% 0.269156
high 30000 2100 7.0% 0.365924
low 40000 720 1.8% 0.130059 24.370% 0.77%
medium + high 60000 3480 5.8% 0.319454
low+ medium 70000 2100 3.0% 0.194392 24.585% 0.55%
high 30000 2100 7.0% 0.365924
low + high 70000 2820 4.0% 0.242292 25.127% 0.01%
medium 30000 1380 4.6% 0.269156
 Total 100000 4200 4.2% 0.25139 25.139%

 

Now let us try to find the entropy of the tree by calculating entropies of individual components of the first tree (with 3 nodes – low; medium; high)

Entropy_{\mathbf{{low}}}=-(1.8\%\times log_{2}(1.8\%)+(98.2\%)\times log_{2}(98.2\%))=13.0059\% 

Entropy_{\mathbf{{medium}}}=-(4.6\%\times log_{2}(4.6\%)+(95.4\%)\times log_{2}(95.4\%))=26.9156\% 

Entropy_{\mathbf{{high}}}=-(7.0\%\times log_{2}(7.0\%)+(93.0\%)\times log_{2}(93.0\%))=36.5924\% 

Now, the total entropy of this tree is just the weighted sum of all its components. Here, the weights are the number of customers in a node divided by the total number of customers e.g. 40,000/100,000 = 0.4 for the first node.

Entropy_{\boldsymbol{{tree}}}=(0.4\times13.0059+0.3\times26.9156+0.3\times36.5924)\%=24.255\% 

Finally, we need to find the value of information gain i.e.

\textup{Information Gain}=25.139\%-24.255\%=0.88\% 

Incidentally, information gain for the tree with three nodes is the highest as compared to other trees (see the table above). Hence, the c4.5 algorithm using entropy will create the following decision tree:

 

C4.5 Decision Tree using Entropy

C4.5 Decision Tree using Entropy

Sign-off Note

How cool is entropy! Yes, I admit I love physics. However, this connection of thermodynamics with information still gives me goose bumps. The idea is that information removes uncertainty or randomness in the system. Hence, using information one can take the path from disorder to order! Yes, the Universe is destined to move toward disorder or randomness, but in small pockets, we can still use information to produce order.

If you are keen on learning deeper links between thermodynamics & information theory then watch the following two-part documentary by the BBC:
– Order and disorder (Link)

See you soon with the next article.

  • Share
  • Twitter
  • Facebook
  • Email
  • LinkedIn

Related

Posted in Marketing Analytics, Retail Case Study Example | Tags: Business Analytics, Marketing Analytics, Predictive Analytics, Retail Analytics, Roopam Upadhyay |
« @ Indian Institute of Management (IIM), Lucknow
Model Selection – Retail Case Study Example (Part 7) »

14 thoughts on “Decision Tree – Entropy – Retail Case Study Example (Part 6)”

  1. Maya Shenoi says:
    September 18, 2014 at 3:22 pm

    Quite interesting.

    Reply
  2. manoj says:
    October 23, 2014 at 3:05 pm

    This is really interesting (sage of physics in analytics)………..thank you for posting such a nice article

    Reply
    • Roopam Upadhyay says:
      October 23, 2014 at 3:19 pm

      🙂

      Reply
  3. Binita says:
    October 29, 2014 at 2:36 pm

    very well explained Roopam!

    Reply
  4. anup says:
    November 14, 2014 at 11:58 am

    the way u explain analytics is awesome …….thank you

    Reply
  5. Jasmeet says:
    November 22, 2014 at 10:35 pm

    IIT-B rocks! This is awesomely put together! Cheers!

    Reply
  6. aj says:
    August 27, 2015 at 6:10 pm

    This has to be the best explanation of decision trees…thanks a lot

    Reply
  7. Rishabh says:
    September 15, 2015 at 3:00 pm

    Brilliant explanation. Most of the statisticians, who i have talked to, only provide formulae and definitions but lack the basic understanding of analytics (they know the how but not the why). Your blog is by far the best analytics blog i have ever read. The analogies, which you provide, help me visualize the problem and techniques.

    A big kudos for your effort

    Reply
  8. Sourav says:
    October 6, 2015 at 5:09 pm

    Hi Roopam,

    Can you help me in understanding the result I am getting after applying randomforset:

    Call:
    randomForest(formula = myFormula, data = tree)
    Type of random forest: classification
    Number of trees: 500
    No. of variables tried at each split: 3

    OOB estimate of error rate: 46.93%
    Confusion matrix:
    F M class.error
    F 1040 992 0.4881890
    M 960 1167 0.4513399

    Thanks

    Reply
  9. raj says:
    March 23, 2016 at 1:45 pm

    Roopam,

    The entropy (low, medium, high etc) in the above table is different from what I tried to calculate.
    > x1=c(4.2,1.8,4.6,7,1.8,5.8,3,7,4,4.6)
    > y1=-( (x1/100*log(x1/100,2))+ ((100-x1)/100*log((100-x1)/100,2)) )
    > x1
    [1] 4.2 1.8 4.6 7.0 1.8 5.8 3.0 7.0 4.0 4.6
    > y1
    [1] 0.2513881 0.1300588 0.2691559 0.3659237 0.1300588 0.3194540 0.1943919
    [8] 0.3659237 0.2422922 0.2691559
    >

    Am I missing something ?
    Please help.

    Thanks

    Reply
    • Roopam Upadhyay says:
      October 12, 2016 at 11:01 pm

      Hey Raj,

      Sorry for the delayed response. By the way, your calculation is correct. I took a shortcut in my calculations of multiplying weights of each node in a single step i.e. 0.1300588 * 0.4. This caused all the confusion. I have made the corrections in the article now. Thanks for pointing this out.

      Cheers,

      Reply
  10. guy says:
    June 27, 2016 at 5:29 pm

    Hi roopam,

    Great articles!

    About the numbers, I thing that raj is right.

    Reply
  11. Noureddine says:
    October 12, 2016 at 8:43 pm

    Thank you very much Roopam for your effort to simplify concepts and make them easy to absorb.

    Reply
  12. Riya Ds says:
    June 17, 2018 at 10:42 am

    Sir, Really want to thank you…Your way of explaining things is really wonderful…

    Reply

Leave a comment Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Subscribe to Blog

Provide your email address to receive notifications of new posts

Must Read

Career in Data Science - Interview Preparation - Best Practices

Free Books - Machine Learning - Data Science - Artificial Intelligence

Case-Studies

- Marketing Campaign Management - Revenue Estimation & Optimization

Customer Segmentation - Cluster Analysis - Segment wise Business Strategy

- Risk Management - Credit Scorecards

- Sales Forecasting - Time Series Models

Credit

I must thank my wife, Swati Patankar, for being the editor of this blog.

Pages

  • Blog-Navigation
  • Art
  • About
  • Contact
© Roopam Upadhyay
  • Blog-Navigation
  • Art
  • About
  • Contact
 

Loading Comments...