Credit Scorecards : Classification Problem

Classification Problem in Statistics & Data Mining

I must say I was shocked when Amishi, a girl little over three years old, announced that going forward she is only friends with my wife and not me. Her reason for the breakup was that I am a boy and girls can only be friends with girls. She has learned this social norm from her friends at the preschool. I still remember the way she modeled for me in her swimsuit and umbrella just a few months ago. She was aware of the boy-girl difference even then, it is just she has learned this weird social norm now. The point over here is that toddlers can distinguish genders without much effort. Nature has given us a built-in equation to classify gender through a mere glance with a high degree of precision. Imagine a similar mechanism to distinguish between good and bad borrowers. You are talking about every banker’s dream. However, evolution has trained us to mate not to lend.

Predictive Analytics: Classification Problem – by Roopam

As I have mentioned in the previous article, scorecards have their roots in the classification problem in statistics and data mining. The idea with most classification problems is to create a mathematical equation to distinguish dichotomous variables. These variables can only take two values such as

• Male/ Female
• Good / Bad
• Yes / No
• God / Devil
• Happy / Sad
• Sales / No Sales

The list can go on until eternity. The reason why most business problems try to model dichotomies is that it is easy to comprehend for us humans. We must appreciate that dichotomies are never absolute and have degrees attached to them. For example, I am 80% good and 20% bad – at least I would like to believe this. I shall keep Pareto’s 80-20 principle away from this i.e. my 20% bad is responsible for my 80% of behavior.

Credit Scorecards Development – Problem Statement & Sampling

In the case of credit scorecards, the problem statement is to distinguish analytically between the good and bad borrowers. Hence, the first task is to define a good and a bad borrower. For most loan products, good and bad credit is defined in the following way

1. Good loan: never or once missed on the EMI payment
2. Bad loan: ever missed 3 consecutive EMIs in a row (i.e. 90 days-past-due)

Additionally, for tagging someone good or bad, you need to observe his or her behavior for a significant length of time. This length of time varies from product to product based on the tenor of the loan. For home loans, with a tenor of 20 years, 2-3 years is a reasonable observation period.
However, there is nothing sacrosanct about the above definition and can be modified at the discretion of the analyst. Roll-rate analysis and vintage analysis are the two analytical tools you may want to consider while constructing the above definition.

Sampling Strategy for Credit Scorecards

A few years ago, I did a daylong workshop on Statistical Inference for a large German shipping & cargo company in Mumbai. At the time of Q&A session the Vice President of operations asked a tricky question, what is a good sample size to achieve good precision? He was looking for a one-size-fits-all answer and I wish it were that simple. The sample size depends on the degree of similarity or homogeneity of the population in question. For example, what do you think is a good sample size to answer the following two questions?

1. What is the salinity of the Pacific Ocean?
2. Is there another planet with intelligent life in the Universe?

In terms of population size, a number of drops in the ocean and planets in the Universe is similar. A couple of drops of water are enough to answer the first question since the salinity of oceans is fairly constant. On the other hand, the second question is a black swan problem. You may need to visit every single planet to rule our possibility of an intelligent form of life.

For credit scorecard development, the accepted rule of thumb for sample size is at least 1000 records of both good and bad loans. There is no reason why you cannot build a scorecard with a smaller sample size (say 500 records). However, the analyst needs to be cautious in doing so because a higher degree of randomness creeps in a small data sample. Additionally, it is also advisable to keep the sample window as short as possible i.e. a financial quarter or two while scorecard development. Further, the sample is divided into two pieces – usually, 70 % for development and remaining for validation sample. We discuss the development and validation sample in detail in the subsequent sections of this series.

Credit Scorecard Development: Sampling Strategy – by Roopam

Sign-off Note

In the next article, we will discuss an important topic of variables classing and coarse classing for credit scorecards. See you soon

20 thoughts on “Credit Scorecards – Classification Problem (part 2 of 7)”

Well written concise explanation.

Could you explain a bit more about ‘ it is also advisable to keep the sample window as short as possibl’? what’s is the rationale?

Roopam Upadhyay says:

July 18, 2015 at 7:50 am

The idea is to keep the sample window as homogenous as possible i.e. making apple to apple comparison between loan profiles. For instance if you take a sample window of loans originated between 2005-09 then they are not homogenous. Since the economy in 2005 was completely different from economy in 2008. The depression of 2007-08 was a global phenomenon but one has to keep similar local economic factors in mind as well. If you take the sample window reasonably short say a quarter or two then it is a ok to assume homogenous environment like economy, employment rate etc.

Reply
- peeyush says:
  
  December 3, 2017 at 2:29 pm
  
  Thank you for your post,it gave me a few insights. My comment is related to your reply to a user-query on this in your part-2 post on CS: ‘it is also advisable to keep the sample window as short as possible’
  In your reply, you inter-alia wrote: “For instance if you take a sample window of loans originated between 2005-09 then they are not homogenous. Since the economy in 2005 was completely different from economy in 2008. The depression of 2007-08 was a global phenomenon but one has to keep similar local economic factors in mind as well. ”
  
  My view: If we wish to not have more than 2 quarters of data, reason being “economy being different”, then by this logic, such a scorecard will have a shelf life of only ~2 quarters, since the starting logic of excluding beyond 2 quarters data would also apply when scorecard gets used…Generally, scorecards have a few years of shelf life.
  
  In my view, build-data length should be long enough to: (1) hopefully capture variations in conditions, during the expected shelf-life of the scorecard (2) exclude seasonality (e.g., catastrophes, e.g., loans given during floods, bushfire, etc., quarter-of-the-year effect).
  
  One can do tests on build-data, to find if there are any structural breaks.
  
  Reply
  - abhas says:
    
    January 10, 2019 at 9:57 am
    
    Hi,
    Can you throw some light on how to practically deal with once in 5 years kind of event or one-off events like Demonetization? One can certainly not exclude or ignore this data from analysis – so what do you do? Additionally, how can one give higher / lower weightage to some portion of data in training set?
    
    Reply

Thanks for providing the wonderful platform to learn.Can you please brief the diagram?

Roopam Upadhyay says:

May 19, 2017 at 7:38 pm

The diagram illustrates the development sample and performance window for the scorecard development. The development sample includes all the loans underwritten within a specific period in this case it is Jan- Jun 2013. The performance window is the period for which the loans were tracked to tag them as good/bad i.e. 3 years rolling window in this case.

Reply

I was trying to understand the probabilities output by a logistic regression in credit scorecard, let us say that I have performed Vintage analysis and identified the performance period as 6 months , and Bad rate is defined as 90DPD , so when I perform logistic regression for incoming new customers , the probabilities would define the probability that a customer will go Bad in the next 6 months ( which comes from performance period) ?

Roopam Upadhyay says:

October 10, 2017 at 11:00 am

Yes, that’s correct.

Reply
peeyush says:

December 3, 2017 at 2:42 pm

Abhishek: If your query is how to convert 6-months horizon into 12-months, then there are softwares to achieve it. If the scorecard is only for decision purpose, then cut-off is important. Only if you wish to use that scorecard for capital (PD) models, would you need to think about time horizon conversion.

Reply
- abhas says:
  
  January 10, 2019 at 9:53 am
  
  Hi,
  Can someone please elaborate on conversion of time horizon and how to achieve it? For TTC to PIT there is one Vasicek that I am aware of but how to go about other way round?
  
  Reply
  - Vin says:
    
    January 13, 2019 at 10:14 pm
    
    Hi
    
    Can you post the Vasicek paper link for pit to ttc
    
    Reply

How do you deal with new customers in application score card wherein we do not have their past performance behavior unlike in case of existing customer.

As an add-on, there are a few other factors that influence decisioning. E.g., studying the project-charter to find main intent of building the scorecard. E.g., if the intent is to reduce bad-rate, then taking a longer-period especially one having some high bad-rate, is advisable. Reason is that scorecard should get conservative this way, and more importantly, objectively.

What is a rolling performance window?

Thanks

Narasimhan

Roopam Upadhyay says:

April 18, 2018 at 8:07 am

Sliding or rolling window represents the equal performance tracking time for loans. The loans used for the development of scorecards originate in different time periods hence they need to be tracked for the same period, say 2 years.

Reply

Hello Roopam,

Thank you for your response.How do i know the sample that i have taken is unbiased?

Narasimhan

Roopam Upadhyay says:

April 18, 2018 at 10:06 am

What kind of biases do you expect in your sample? And, why? You need to define that first.

Reply

Hi Roopam,

Regarding the example of performance period you gave:”For home loans, with a tenor of 20 years, 2-3 years is a reasonable observation period”,
But suppose the observation point is Mar 2015, and observation period is 2 years. and the account defaulted on Feb 2017, I doubt the performance information at Mar 2015 would be highly correlated with the default, so wouldn’t be better if the observation period is shorter? say 12 months?

Thanks

Hi Roopam,

Thanks and its very informative, I have a question if i have data for 3 years say (2017-2020) ,
1. I have done the vintage analysis for PERFORMANCE window, and i have selected at 24 months as till this point i noticed the cumulative bad rate is fast pace and then its stabilized.

Now which period i need take as observation window, can i consider after 24 months for 6 months

Thanks,
Sid