This is a continuation of the banking case study for the creation of application risk scorecards that we have discussed in some previous articles. You can find the other parts of the series at the following links (Part 1), (Part 2), (Part 3), (Part 4), and (Part 6).
Reject Inference is a topic that separates credit scoring from the other classification problems such as marketing propensity models for cross / upselling, etc. As you will discover later, reject inference is about patching information gaps that exist during the development of application scorecards. Let us try to gain a more holistic perspective about patching information gaps through the way human beings have evolved.
Connecting the Dots
Recently I watched a Hindi movie called ‘Ankhon Dekhi’, the movie title translates to ‘seen with your eyes’. In the beginning, the central character of this movie, after a dramatic event in his life, decides to believe only what he sees with his eyes. What follows are his adventures / misadventures while doing so. Although the theme of this movie has a high potential, I think it became a bit pretentious in its presentation especially towards the end. The idea of believing your eyes seems appropriate but it has its own shortcomings. Evolution has trained our brain to superseded our vision to make split second decisions. Numerous optical illusions are proofs of this phenomenon. We will explore some optical illusions / illustrations that will highlight how our brain and eyes work in this article. But before that, let us consider an example of split second decision making as a necessity for survival.
Imagine a human ancestor in the middle of the dark. Our ancestor is hungry, he hasn’t eaten in days. He sees a silhouette of a creature lurking in front of him. This creature could be his next meal. On the other hand, this could be a predatory creature and our ancestor will become a delicious meal for this creature. The reason why humans are still around on this planet is because our ancestors’ eyes and brain have created some simple rules to deal with this situation. One of the instruments evolution has equipped humans with is ..
Power of Context
As promised earlier, let me present a couple of illustrations to emphasize the power of context. In the first of these illustrations (shown adjacent), try to compare the length of two yellow lines and decide which one is longer. In this case, you will most probably identify the top yellow line as longer than the bottom yellow line. In this illusion, your brain will supersede the information received through your eyes based on the context or surrounding patterns around the yellow lines. As you might appreciate our three-dimensional world will rarely, or most probably never, offer a pattern similar to the optical illusion of illustration 1. Hence, for most practical purposes our brain has made the right decision though may seem ridiculous in this case.
Now, let us have a look at the second illustration as shown adjacent. Notice B and 13 in the middle of the top and the bottom sequences, they are identical. You read the top sequence ABC and the bottom sequence 12,13,14. This is phenomenal, what your brain has just done in a split second is something most text mining and artificial intelligence algorithms try to do painstakingly. I must point out, CAPTCHA is a proof that most of these algorithms fail to capture what nature has equipped us with – the ability to join missing links.
Our brain tries to fill the gap in our information using the available information. This is precisely what we try to do while using reject inference for credit scoring.
Reject Inference
Let us try to understand the dynamics of the loan application process before establishing the necessity for reject inference. The ‘through-the-door’ loan applications are assessed by underwriters to establish the creditworthiness of the applicants. The underwriters will either accept or reject the applications based on the credentials of the applicants. Moreover, the customers with accepted applications will either avail the loans or not. This is shown in the schematic below:
As you could see in the above schematic, we have information about just the disbursed loans to tag them as good or bad based on their performance. However, to create holistic scorecards for the entire through-the-door population we need to infer the behavior of the rejected loans. This process of supplementing information is called reject inference and is essential for developing holistic scorecards. The following segments cover a few commonly used ways to perform reject inference. I must also point out that the following methods are not perfect despite being extensively used in the industry.
Use Credit Bureaus
This method involves using information from credit bureaus to fill the gaps. If other lenders have disbursed loans to your rejected applicants then it makes sense to tag the rejected customers good or bad based on their performance with the other lenders. Although this method is possibly the best way to infer rejects with concrete information, it has the following challenges
- It unlikely that all the rejected loans have got a loan with some other lenders around the development period of the scorecard
- Difference in collection process and reporting among lenders could influence dubious tagging for customers’ performance
In most cases using credit bureaus information alone won’t be sufficient enough to tag the entire through-the-door population. That is why we need analytical methods for reject inference as discussed in the next segment.
Augmentation through Parceling
Augmentation in different forms is the most commonly used methodology for reject inference. Now as shown in the above schematic we have fairly concrete tagging of good and bad loans for all the disbursed loans. We can easily run a classification algorithm like logistic regression (follow this link Part 3), neural nets or decision tree to create a Known-Good-Bad (KGB) model. The same KGB model is used to score the rejected loans. Once the scoring is completed the analyst could create a table similar to the one shown below:
Score Range | Disbursed % Bad | Disbursed % Good |
Total Rejected Applications |
Rejects Inferred Bad |
Rejects Inferred Good |
0–231 | 27.0% | 73.0% | 1,838 | 496 | 1342 |
232–241 | 22.0% | 78.0% | 2,295 | 505 | 1790 |
242–251 | 17.0% | 83.0% | 3,162 | 538 | 2624 |
252–261 | 15.0% | 85.0% | 3,659 | 549 | 3110 |
262–271 | 9.0% | 91.0% | 3,298 | 297 | 3001 |
272–281 | 7.0% | 93.0% | 3,893 | 273 | 3620 |
282–291 | 4.0% | 96.0% | 2,627 | 105 | 2522 |
292–301 | 1.5% | 98.5% | 2,612 | 39 | 2573 |
302–311 | 0.7% | 99.3% | 2,441 | 17 | 2424 |
312+ | 0.3% | 99.7% | 1,580 | 5 | 1575 |
As you may notice in the above table, we have divided rejected applications into the same proportion of good / bad as in the disbursed loans for the score range. For instance, the score range of 232-241 has 22% bad loans. We have divided 2295 rejected applicants in this bucket into 505 (this is 22% of 2295) bad loans and 1790 good loans. We will randomly choose 505 rejected applications in the score range of 232-241 and assign them as bad loans (the remaining loans in this bucket will be assigned as good). Now we will create a holistic scorecard by re-running the classification algorithm i.e. logistic regression on the entire through-the-door population.
I hope you have noticed that we have used the principles of power-of-context discussed above by using score ranges as the criteria for augmentation.
Fuzzy Augmentation
A fuzzy augmentation is an extended form of parceling, here rather than randomly assigning loans as good and bad we will create multiple copies of rejected loans in the proportion of good / bad % in the score range. For instance, 22 copies of a single rejected loan in the score range of 232-241 will be tagged as bad and 78 copies as good. The process will be repeated for all the rejected loans. This is similar to the workings of fuzzy logic. Fuzzy augmentation is believed to be a superior method for reject inference to produce holistic scorecards.
Sign-off Note
I know all the above methods for reject inference have their shortcomings. I have seen several experts and academicians cringe at the mention of the above methods for reject inference. However thus far, these are the best methods we have for reject inference with our current knowledge of mathematics and logic. I must say, nature is still hiding a few brilliant tricks under her sleeves such as our own ability to decipher CAPTCHAs. Some day when we will learn more about the inner workings of our own brain we might crack the bigger code for reject inference and millions of similar problems. Nature does reveal herself in piecemeal so there is still tremendous hope!
References 1. Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring – Naeem Siddiqi 2. Credit Scoring for Risk Managers: The Handbook for Lenders – Elizabeth Mays and Niall Lynas
Thank you for the explanation. On parceling & fuzzy augmentation, what’s the recommended % of rejected applications and disbursed loans you will recommend in creating this “through the door applications” pool for the updated logistic regression model?
Also, why is the fuzzy augmentation a more superior method? This seems to be risky as if we pick that initial copy of the bad/good rejected applications “wrong” (in the sense that the actual result is the opposite of what we pick it for had the applications gone through the process), then we are replicating the wrong results multiple times in our population, no?
To answer your first question, the minimum number of good/ bad /rejected loans that you want to use for model development is 1000 each. Also, adjust the model based on the original population prior probabilities for good/ bad loans.
For your second question, I hope I am getting your question right, if the initial KGB model is wrong then all the augmentation methods for reject inference (parcelling or fuzzy) will generate wrong results.But a fair assumption is that you have detected some logical & significant patters in your KGB model to apply them on top of rejected sample. Fuzzy augmentation is better than simple parcelling because instead of randomly tagging rejected loans good/bad you are using a weighted sample. I hope this helped.
Thank you Roopam.
hi roopam,
thanks for the crisp articles.
i wanted to ask say if good and bad % are 75 and 25 respectively for a score range and the total number of booked cases where we have the information is say 1000. Would adding 25-75 copies vs 1-3 copies make a difference to the model as the number of total cases with actual data are fixed. so when comparing between 1 and 25, its 0.1% and 2.5% of 1000. What is the good practice to follow?
In fuzzy augmentation, the idea is to make the same number of copies of all the loans, say 1000. So, if you have 2000 KGB loans and 1000 rejected loans then you will have 3000×1000 observations. The KGBs will be marked with 0 and 1 with 100% certainty. The rejected loans will be marked 0 & 1 based on fuzzy logic, as explained in this article. Hope this helped.
Hi Roopam,
when we use reject inference, why we no need to extract information from “customer didn’t take the loan” too? why only rejected customer?
Thanks
That’s a good question. Actually, the approved base (loan sanctioned but not disbursed) is assumed to have similar properties as the disbursed base since it had filtered through the same loan approval policy. The rejected loans, however, are completely different since they could not go through the policy filters. However, if there are reasons to believe that the not-disbursed loans are different then they need to be treated like the rejected loans.
Thanks for the answer Roopam.
Do you have any recommendation method to build good Credit Scoring except logistic regression?
You will find this interview useful http://ucanalytics.com/blogs/conversation-naeem-siddiqi-author-credit-risk-scorecards-credit-scoring-guru/
Hi roopam,
Big fan!
One q, when u classify the rejected apps on credit scores. So u obtain these scores from your logistic model – where u decide on a threshold score above which it’s a 1 or else 0 . Is that right ? New to this world ..hence the level of q
Thanks!
A known-good-bad(KGB) Logistic regression model scores the rejected loans and provides the probabilities for the rejected loans (if they were accepted) to go bad. Now, if the output for 100 rejected loans is exactly the same say 10% probability then 10 loans out of these 100 loans would have gone bad based on the model. But you don’t know exactly which ten loans would have gone bad. So you randomly assign 1(bad loan) to any ten loans and remaining 90 loans are assigned 0(good loan). Augmentation is a slightly more sophisticated way of doing this same thing. So there is no threshold score as such but you use the probabilities to assigns the good/bad tagging. These new data for rejected loans along with the KGB loans are then used to create the final model for credit scoring.
Hi there, great article. Thanks for this.
Isn’t it 3000×100 (rather than 3000*1000).
E.g.
For a given KGB score band (let’s say 0-200):
– We have 2000 KGB and 1000 Rejects, hence we have 1000/3000 reject rate (33.3%)
– We have 25% Bad Rate
So the fuzzy logic suggests that:
– each rejected customer is multiplied proportionally: 1000*25 (Inferred Bads), 1000*75 (Inferred Goods)
– Each customer of the KGB population is multiply simply by 100 (to preserve Accept Reject ratio)
Then our sample for this score band will simply be: 1000*25 (infered bads) + 1000*75(inferred goods) + 2000*100 (realized Goods & bads) = 300 000 (which is 3000 * 100)?
Also, I want to point out that by rejects we should consider non-policy rejects (or score rejects). Policy rejects (Age<18) are out of scope
Really Helpful.
Interesting, I am curious what the statistics are on your first point there