Welcome back to our retail case study example for campaign and marketing analytics. So far in this case study, we were working on a classification problem to identify customers with a higher likelihood to purchase products from the campaign catalogues. In the last article on model selection, we noticed that artificial neural networks performed better than logistic regression, and decision tree algorithms for our classification problem. In this article, we will try to gain an intuitive and simplified understanding of artificial neural networks which is inspired by our brain. In the next few segments we will learn about the properties of the brain that artificial neural networks try to mimic, such as:
Seeing with Your Tongue!
Eric Weihenmayer climbed Mount Everest in 2001. By doing this he became the first and till date the only blind person to achieve this feat. He pursues his passion in extreme rock climbing through a device called BrainPort, which helps him to see using his tongue! This device has a camera at one end connected to several hundred tiny electrodes that Eric places on his tongue to experience obstacles on his path. This experience for Eric is made possible through the incredible learning adaptiveness of the human brain. Initially, when Eric started using this device, he used to feel just some tingly sensation on his tongue associated with some experience. Slowly his brain learned to correlate each experience with a distinct sensation and enabled him to experience seeing. This is a phenomenal story about our brain’s capability to adapt – the property that has inspired machine learning algorithm called artificial neural networks.
Neural Networks’ Feed-Forward & Feedback Loops
The brain connects with other parts of the body through an intricate network of neurons called biological neural networks. The brain works with a powerful mechanism involving both feed-forward and feedback loops within these intricate neural networks. For example, the feed-forward mechanism involves inputs from the sensory organs, like eyes and ears, converted into outputs i.e, information and understanding. The feedback mechanism, on the other hand, makes the brain communicate with sensory organs and modify their inputs.
To learn about this better, let’s perform a couple of small experiments. For the first experiment, close your eyes and say the following 3 words, at an interval of 10 seconds each, with the intention to visualize them.
- Dragon killer
Most probably, you had visualized an elaborated scene with a dragon attacking a village and getting killed by a dragon slayer. What you have just accomplished is a phenomenal capability of the brain to extract information about these words in a split second, and visualize a whole sequence of events without using your eyes. This is also the source of the elaborated imagination that human brains possess. In this case, one form of input (words) has generated another form of input (visualization) through a complicated process in our brain.
This second experiment will help us understand the feed-forward and feedback loop in our brain. It’s quite likely that you have come across the following sentence on some social media site. Anyway, read the sentence in the following text box.
|Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn’t mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteer be at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe.|
Incredible, isn’t it? You brain has gone through several cycles of feed-forward and feedback to read these jumbled letters in a matter of seconds. The brain, in this case, has superseded the incomplete and jumbled input i.e. information from eyes to meaningful output i.e. understanding of this sentence. Artificial neural networks try to mimic our phenomenal brain for prediction purposes through both feed-forward and feedback loop between input and output variables.
Artificial Neural Networks – Retail Case Study Example
Artificial neural networks are nowhere close to the intricacies that biological neural networks possess, but we must not forget the latter has gone through millions of years of evolution. On the other hand, artificial neural networks (from here on referred to as neural networks) have a history of close to half a century. In the 1990s, neural networks lost favour to other machine learning algorithms like support vector machines, etc. However, in the last decade or so, there is a renewed interest in neural networks because of the rise of deep learning. Let us try to understand the design of neural networks and their functionalities using our retail case study.
As displayed in the adjacent figure, neural networks could be broadly divided into three layers – input, hidden, and output. The hidden layer is the additional feature that separates neural networks from other predictive models. If we remove hidden layer from this architect it will become a simple regression (for estimation) or logistic regression (for classification), architect. The input layer in this architect is simply the input variables. For our retail case study as discussed in the last articles some of the input variables are:
The output layer, for our classification problem to identify customers who will respond to campaigns, is the binary variable that represents historic responders (0/1).
Mathematical Construct of Neural Networks
This section describes the mathematical construct of neural networks. If this seems a bit complicated to you, for now, I suggest you jump to the next section about the usage of neural networks.
Let’s come back to the hidden layer, each hidden layer has several hidden nodes (orange circles in the above figure). Each hidden node takes a weighted sum of input from every input variable. The following expression represents the weighted sum of input variables that the hidden nodes take as input. These input variables can be compared with input signals our sensory organs send to our brain, for example, in a case of fire around you – you see fire, you hear fire burning, you smell smoke, and your skin feels hot (a complete sensory experience through several input nodes).
To begin with, the above weights Wi(Input→Hidden) & W0 are chosen at random, then they are modified iteratively to match the desired outputs (in output layer). Continuing with the above example of fire, if the sensory signals about the fire are too strong the tendency of the creature for self-preservation will take over. However, sensory signals about the fire from cooking stove need to be accounted as well for humans to cook. Hence, weights need to adjust for the fire usage and self-preservation.
In the hidden layer, the above linear weighted sum [(Hidden Node)i] is converter to non-linear form through a non-linear function. This conversion is usually performed using the sigmoid activation function, yes this is the same logit function of the logistic regression. The following expression represents this process
Remember 0 ≤ P(Hidden)j ≤ 1; this output [P(Hidden)j] for the different hidden nodes (j) becomes the input variables for the final output node. As described below
This linear weighted output is again converted to non-linear form through sigmoid function. The following is the probability of conversion of a customer P(Customer Response) based on his/her input variables.
Neural network algorithms (like back propagation) iteratively modify weights for both links (i.e. Input→Hidden→Output) to reduce the error of prediction. Remember the weights for our architect are weights Wi(Input→Hidden), W0 weights Uj(Hidden→Output) & U0
Pros and Cons of Using Neural Networks
Let’s quickly sum of some of the important pros and cons of using neural networks for model development
- Neural networks offer highly versatile methods to solve four of the six broad categories of data science tasks i.e. classification, estimation, forecasting, and clustering (self-organizing map). These six broad categories of data science tasks were discussed in the previous article as displayed in the adjacent diagram.
- Neural networks are fairly insensitive to noise in the input data (similar to our brain) because of the hidden layer that absorbs noisy information.
- They are better equipped to handle the fuzzy / non-linear relationship between input variables, and the output variable.
- Neural networks are often considered as black boxes (again similar to our brain) because they don’t explicitly highlight the relationship between input and output variables. This is highly unlike the decision trees which offer highly intuitive solutions.
- There is no set rule for choosing a number of hidden layers and hidden nodes while designing a neural networks architect. This requires a proficient data scientists to develop neural networks models.
- Neural networks are often susceptible to overfitting, hence the analysts need to test the results carefully.
These are early days for artificial neural networks but they sure have a lot of promise. Nature has patiently, and meticulously designed and modify our brain to develop phenomenal biological neural networks. I doubt if humans have the same amount of patience as Mother Nature. A virtue we can all learn from her.
See you soon with the next part of this case study.