Subscribe to DSC Newsletter

What is the best way to convert probability of default into a risk score ranging from 0-1000 ?

I have used RAD scores.....any other thought ?

Views: 25384

Reply to This

Replies to This Discussion

Have you thought of using naive Bayes? I am assuming that you have several metrics (independent variables) in your risk model, and that the response is "probability of default". You could also use my technique called Hidden Decision Trees, it is rather easy to implement, even in Base SAS. I hope to make it available for free, online (SaaS) when I find the time.
Hi Tom ,It is an operational decision based on considerations such as:
• Implementability of the scorecard into application processing
software. Certain software can only implement scorecards in the
format
• Ease of understanding by staff (e.g., discrete numbers are easier to
work with).Suppose if I give a probability in an outcome of an exam in school then it will be difficult
to understand so we will assign score.
• Continuity with existing scorecards or other scorecards in the
company. This avoids retraining on scorecard usage and interpretation
of scores.
Hi

I am not sure why you want to do this in some cases we do it for system issues, common understanding, aligning with various other scores both internal and external etc.

I guess the best way to do it is to convert the probability estimate to a specific odds scheme for example a good to bad odds of 20 to 1 at 500 doubling every 20/40 points. That way you will have the advantage of converting the probability to a scale of your choice (0-1000 in your case) while retaining the advantage of converting the number to a desired probability of default. Most commercial scores like FICO follow a similar approach. In case this is soemthing you are interested in there are a few different methods of doing this and we could discuss in detail.

Regards
Hindol
The technique that I have followed is explained below;
Where the scorecard is being developed using specified odds at a score and specified “points to double the odds” ( pdo), the factor and offset can easily be calculated by using the following simultaneous equations:
Score = Offset + Factor ∗ ln (odds) Score + pdo = Offset + Factor ∗ ln (2 ∗ odds)
Solving the equations above for pdo, we get
pdo = Factor ∗ ln (2), therefore Factor = pdo / ln (2);Offset = Score {Factor ∗ ln (Odds)}

For example, if a scorecard were being scaled where the user wanted odds of 50:1 at 600 points and wanted the odds to double every 20 points (i.e., pdo = 20), the factor and offset would be:
Factor = 20 / ln (2) = 28.8539 Offset = 600 – {28.8539 ln (50)} = 487.123

And each score corresponding to each set of odds (or each attribute) can be calculated as:

Score = 487.123 + 28.8539 ln (odds)
Log(odds) is what the Logistic puts out. Why use such a complex score as above, when all you're doing is only manipulating Log(Odds) using a linear equation?

And what is the exact use of PDA? Wouldn't a direct use of just the Log(odds) serve the purpose to know if we've doubled the odds or not?

Finally, I quite didn't follow that part where you said, "wanted odds 50:1 at 600 points"! Also, when you say double every 20 points, does it mean that for a person A with score = Score B + PDA, he'd have double the odds of being a Y=1?? Can you elaborate.
If log(odds) score is inaccurate, then so would the probability score, and so would the RAD score as outlined above!

According to my understanding, if your sample well represents the population in terms of Y's R:NR ratio, and if reqd some X's as well, then there shouldn't be a problem in log(odds) not being a biased estimate.

dear roy, when i am creating a scorecard , how to select the odds ? 50:1 or other value? and about the 600, d thanks!!

Subhadip, Can you enumerate on the RAD Scores??

Tom, as my understanding goes, Probability is a very tricky concept, and interpreting that is something we might want to think twice about. If I'm modeling a guest response of 5% in a database of 1MM population, would you expect probability scores of 70%+ for responders, and 30%- for non-responders? Expect we would, but I've hardly ever seen that kind of a scenario.

Most situations, the probability doesn't even make sense, especially when we say that he/she is a responder with a Probability score of 0.3 (or sometimes lower)! That's worse than if I had to pick random?! Then why model? That's another problem altogether.

A score, on the other hand, masks this concept, and thus tells us, who's a more suitable target customer! Precisely what we would want, rather than a questionable probability score.

Another way around this, just decile rank your scores/probability scores. The top deciles are bound to have them all, all you'd want to target I'd think.

Just to finally add one more thing -
Even if the model did tell you that a customer is 0.9% likely to respond upon being contacted, the customer may end up not responding. That's captured in the 0.1% part of probability, also termed error! Thus, we may not be able to predict is Tom or Vincent would respond, rather, we'd be able to predict that if we contacted N such people, we'd have atleast X% of people responding - X being the Average Probability score of the group/decile/segment.
Under some very general assumptions the probability may be converted into a risk score ranging from 0 to 1000. The main assumption to be considered is that the risk score(RS) is Normally Distributed. (!) Then, mean will be in the middle, say 500, and the standard deviation(sd) can be judged using a relation between sd and range, i.e. range=6sd. In this case it may be 167. Then by equating p and RS under the Normal Probability Distribution, RS could be estimated. In the place of Normal distribution any other distribution could be used.
Isn't a Logistic Distribution/Bernouli Distribution/Binomial a better approximation? That's why we used a Logistic Regression with Logit Transformation!

In case, it's a Probit Regression, then probably, a Normal Distribution makes sense. Not sure though.

RSS

On Data Science Central

© 2019   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service