This is a continuation of the banking case study for the creation of application risk scorecards that we have discussed in some previous articles. You can find the other parts of the series at the following links (Part 1), (Part 2), (Part 3), (Part 4), and (Part 6).
Reject Inference is a topic that separates credit scoring from the other classification problems such as marketing propensity models for cross / upselling, etc. As you will discover later, reject inference is about patching information gaps that exist during the development of application scorecards. Let us try to gain a more holistic perspective about patching information gaps through the way human beings have evolved.
Connecting the Dots
Recently I watched a Hindi movie called ‘Ankhon Dekhi’, the movie title translates to ‘seen with your eyes’. In the beginning, the central character of this movie, after a dramatic event in his life, decides to believe only what he sees with his eyes. What follows are his adventures / misadventures while doing so. Although the theme of this movie has a high potential, I think it became a bit pretentious in its presentation especially towards the end. The idea of believing your eyes seems appropriate but it has its own shortcomings. Evolution has trained our brain to superseded our vision to make split second decisions. Numerous optical illusions are proofs of this phenomenon. We will explore some optical illusions / illustrations that will highlight how our brain and eyes work in this article. But before that, let us consider an example of split second decision making as a necessity for survival.
Imagine a human ancestor in the middle of the dark. Our ancestor is hungry, he hasn’t eaten in days. He sees a silhouette of a creature lurking in front of him. This creature could be his next meal. On the other hand, this could be a predatory creature and our ancestor will become a delicious meal for this creature. The reason why humans are still around on this planet is because our ancestors’ eyes and brain have created some simple rules to deal with this situation. One of the instruments evolution has equipped humans with is ..
Power of Context
As promised earlier, let me present a couple of illustrations to emphasize the power of context. In the first of these illustrations (shown adjacent), try to compare the length of two yellow lines and decide which one is longer. In this case, you will most probably identify the top yellow line as longer than the bottom yellow line. In this illusion, your brain will supersede the information received through your eyes based on the context or surrounding patterns around the yellow lines. As you might appreciate our three-dimensional world will rarely, or most probably never, offer a pattern similar to the optical illusion of illustration 1. Hence, for most practical purposes our brain has made the right decision though may seem ridiculous in this case.
Now, let us have a look at the second illustration as shown adjacent. Notice B and 13 in the middle of the top and the bottom sequences, they are identical. You read the top sequence ABC and the bottom sequence 12,13,14. This is phenomenal, what your brain has just done in a split second is something most text mining and artificial intelligence algorithms try to do painstakingly. I must point out, CAPTCHA is a proof that most of these algorithms fail to capture what nature has equipped us with – the ability to join missing links.
Our brain tries to fill the gap in our information using the available information. This is precisely what we try to do while using reject inference for credit scoring.
Let us try to understand the dynamics of the loan application process before establishing the necessity for reject inference. The ‘through-the-door’ loan applications are assessed by underwriters to establish the creditworthiness of the applicants. The underwriters will either accept or reject the applications based on the credentials of the applicants. Moreover, the customers with accepted applications will either avail the loans or not. This is shown in the schematic below:
As you could see in the above schematic, we have information about just the disbursed loans to tag them as good or bad based on their performance. However, to create holistic scorecards for the entire through-the-door population we need to infer the behavior of the rejected loans. This process of supplementing information is called reject inference and is essential for developing holistic scorecards. The following segments cover a few commonly used ways to perform reject inference. I must also point out that the following methods are not perfect despite being extensively used in the industry.
Use Credit Bureaus
This method involves using information from credit bureaus to fill the gaps. If other lenders have disbursed loans to your rejected applicants then it makes sense to tag the rejected customers good or bad based on their performance with the other lenders. Although this method is possibly the best way to infer rejects with concrete information, it has the following challenges
- It unlikely that all the rejected loans have got a loan with some other lenders around the development period of the scorecard
- Difference in collection process and reporting among lenders could influence dubious tagging for customers’ performance
In most cases using credit bureaus information alone won’t be sufficient enough to tag the entire through-the-door population. That is why we need analytical methods for reject inference as discussed in the next segment.
Augmentation through Parceling
Augmentation in different forms is the most commonly used methodology for reject inference. Now as shown in the above schematic we have fairly concrete tagging of good and bad loans for all the disbursed loans. We can easily run a classification algorithm like logistic regression (follow this link Part 3), neural nets or decision tree to create a Known-Good-Bad (KGB) model. The same KGB model is used to score the rejected loans. Once the scoring is completed the analyst could create a table similar to the one shown below:
|Score Range||Disbursed % Bad||Disbursed % Good
||Total Rejected Applications
||Rejects Inferred Bad
||Rejects Inferred Good
As you may notice in the above table, we have divided rejected applications into the same proportion of good / bad as in the disbursed loans for the score range. For instance, the score range of 232-241 has 22% bad loans. We have divided 2295 rejected applicants in this bucket into 505 (this is 22% of 2295) bad loans and 1790 good loans. We will randomly choose 505 rejected applications in the score range of 232-241 and assign them as bad loans (the remaining loans in this bucket will be assigned as good). Now we will create a holistic scorecard by re-running the classification algorithm i.e. logistic regression on the entire through-the-door population.
I hope you have noticed that we have used the principles of power-of-context discussed above by using score ranges as the criteria for augmentation.
A fuzzy augmentation is an extended form of parceling, here rather than randomly assigning loans as good and bad we will create multiple copies of rejected loans in the proportion of good / bad % in the score range. For instance, 22 copies of a single rejected loan in the score range of 232-241 will be tagged as bad and 78 copies as good. The process will be repeated for all the rejected loans. This is similar to the workings of fuzzy logic. Fuzzy augmentation is believed to be a superior method for reject inference to produce holistic scorecards.
I know all the above methods for reject inference have their shortcomings. I have seen several experts and academicians cringe at the mention of the above methods for reject inference. However thus far, these are the best methods we have for reject inference with our current knowledge of mathematics and logic. I must say, nature is still hiding a few brilliant tricks under her sleeves such as our own ability to decipher CAPTCHAs. Some day when we will learn more about the inner workings of our own brain we might crack the bigger code for reject inference and millions of similar problems. Nature does reveal herself in piecemeal so there is still tremendous hope!
References 1. Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring – Naeem Siddiqi 2. Credit Scoring for Risk Managers: The Handbook for Lenders – Elizabeth Mays and Niall Lynas