How to make machines learn on their own similar to humans? This is the pivotal question for the development of artificial intelligence. To develop intelligent machines and systems (artificial intelligence), we need to understand how human intelligence and learning work. For this, we will explore the ideas behind reinforcement learning. In the process, we will also explore answers to these seemingly unrelated questions.
- How are human babies different from baby dolphins?
- Why is Hollywood, the largest movie industry, dying?
Moreover, we will learn how we could use reinforcement learning to improve data science deliveries for businesses and explore answers to some very relevant questions.
- Is there a better way to improve digital marketing than widely used A/B testing?
- How to create self-learning digital marketing engines and maximize marketing returns at the same time?
The digital world is a hotbed for data, experiments, and learning. Let’s jump directly into our case study example.
Digital Marketing – Case Study Example
In this digital marketing case study example, you are a data science consultant to Helping Hands, a charitable organization. Your client helps children in need in Africa. You are doing this work pro bono. Incidentally, a few months ago when you were working on price optimization for an e-commerce company called Amazin’, you had used the same method that you will be using in this case study. You had made enough money through Amazin’ and can afford to help Helping Hands without fees. Isn’t it cool that as a data scientist you get to play so many different roles? There is a reason why they call it the sexiest job of the 21st century.
Ok so back to our case study example, where you will help Helping Hands to identify the right email message for donation. Helping Hands wants to understand which of these 3 email messages (A, B, and C) will generate the maximum amount of donation from the email recipients.
The digital world is a hotbed for data, experiments, and learning. You had initiated your e-mail marketing campaigns today morning with these 3 messages sent out to thousands of recipients. While you are waiting for the data to measure performance for this campaign, you have decided to learn more about topics that will help create self-learning or intelligent systems and machines. The reason for your curiosity in these topics is that you want to improve the performance of this donation campaign through a self-learning system. The next few sections cover what you learned in your leisure time while awaiting data.
Reinforcement Learning for Artificial Intelligence
The wildebeest is a wild animal. It belongs to the same family as goats, cattle, deer, and sheep. When a wildebeest baby is born, it stands up moments after its birth. Moreover, the wildebeest baby starts running within few hours of its birth. Similarly, dolphin babies start swimming immediately after they are born. Human babies, on the other hand, are extremely fragile and completely dependent on adults for many years after their birth.
So, why are human babies different from babies of other animals? And, how does this difference shape human intelligence? Incidentally, the brains of other animals are more or less fully developed at the time of birth. Humans, on the other end, are born with a highly premature brain in comparison to dolphins and wildebeests. The neural circuitry of human brains matures over the course of 25 years from their birth onwards.
Reinforcement learning is one of the key processes that educates the premature human brain at the time of birth. The idea behind reinforcement learning is simple. The learner is rewarded for the right actions and punished for bad ones. There is a reason why you don’t touch a hot stove twice because the burning sensation (negative reward or punishment) was too unpleasant the first time around. Let’s explore the need for learning during the course of the human lifecycle in the next segment.
Reinforcement Learning – Schooling and Retirement
While growing up I had a retired uncle who had done well in his business and was super-rich. He had a plush marble bungalow, a fancy car, and most importantly a rocking chair. He used to take his naps in his rocking chair while all of us kids watched him jealously. We all thought it was unfair since we slogged in school learning while he slept on his rocking chair.
To everything – turn, turn, turn.
There is a season – turn, turn, turn.
And a time to every purpose under heaven…A time to be born, a time to die.
A time to plant, a time to reap.– Lyrics of a Byrds Song
A time to plant, a time to reap – this is essentially at the core of the human life cycle. This is also at the root of learning and retirement. Humans learn the most as kids while being completely dependent on their parents. Then as adults, they reap the benefits of their learning to earn a living and also learn more to grow in their careers. After retirement, they completely exploit the benefits of their hard work. So I guess my uncle was not being unfair while relaxing on his rocking chair.
Explore and Exploit
This phenomenon is known as ‘explore’ and ‘exploit’. As kids, humans are in 100% exploration or learning mode. After retirement, humans are in complete exploitation or reaping benefits mode. It makes sense because further exploration towards the end of one’s life is not going to bring many benefits or exploits.
Reinforcement learning, in machine learning and artificial intelligence, works on the principles of optimizing both exploration and exploitation at the same time. For a system to sustain and stay healthy, it needs to constantly explore while exploiting. You will develop an intelligent digital marketing system using principles of reinforcement learning but for now, you are still waiting for the data from the campaign you had launched a little while ago.
So, I guess, you have some more leisurely time to catch up on your reading and research work on reinforcement learning, This is when you create some links between learning systems and some of the latest Hollywood movies you watched the last weekend – all sequels by the way.
Hollywood Dying – Explore and Exploit
Amazing Spider-Man 2 was released in 2014. This is just a decade after we had seen the previous trilogy of Spider-Man with Tobey Maguire in the lead role. There have been close to 10 installments of X-Men movies since the beginning of the Millennium. The last five X-Men movies have appeared in the last 5 years. We have seen characters from Avengers in more than 13 movies in the last 10 years. I am not singling out comic book characters created by Stan Lee but pointing to a phenomenon in the last few years where Hollywood is predominantly producing sequels and reboots of the popular franchises.
Apparently, the number of sequels have exponentially gone up in the last few years. The following graph shows just the number of sequels produced by Hollywood since 2008 (source: contently.com). The trend is here to stay, in 2017 we will see 43 sequels, reboots, and remakes (source: uproxx.com).
If we compare this to ‘explore’ and ‘exploit’ optimization. Hollywood is certainly in the exploit mode. They are exploiting all the previously successful movies through sequels and remakes. In the process, they are also creating very little learning or new franchises to exploit later. This is very similar to my rich and retired uncle on the rocking chair – exploiting all the benefits created through earlier hard work. Is it possible that Hollywood has decided to retire? If not, then they have to get in an explore mode soon to avoid old age and eventual death.
Sign-off Note
After running digital marketing experiments, waiting for data is one of the most impatient periods for a data scientist. Luckily, you use this period to learn and explore – as you did in this part of the case study example. This exploration will serve you good while exploiting the true business potential hiding behind numbers when you will get you data in the next part.
You also understand it really well that your constant exploration and learning is the only way for you to make the best out of your career as a data scientist. This is precisely the reason you have this quote by Ben Frankin as your wallpaper.
Some people die at 25 and aren’t buried until 75.
– Benjamin Franklin
Very Good Context setting,neatly explained
great article.. reading this reminded me of Freakonomics.