What is the best way to convert probability of default into a risk score ranging from 0-1000 ? - AnalyticBridge 2019-08-22T12:04:23Z https://www.analyticbridge.datasciencecentral.com/forum/topics/what-is-the-best-way-to?commentId=2004291%3AComment%3A353355&x=1&feed=yes&xn_auth=no dear roy, when i am creating… tag:www.analyticbridge.datasciencecentral.com,2016-09-02:2004291:Comment:353355 2016-09-02T02:41:46.247Z FRANK https://www.analyticbridge.datasciencecentral.com/profile/FRANK <p>dear roy, when i am creating a scorecard , how to select the odds ? 50:1 or other value? and about the 600, d thanks!!</p> <p>dear roy, when i am creating a scorecard , how to select the odds ? 50:1 or other value? and about the 600, d thanks!!</p> Isn't a Logistic Distribution… tag:www.analyticbridge.datasciencecentral.com,2010-09-26:2004291:Comment:79329 2010-09-26T10:34:21.056Z Arun https://www.analyticbridge.datasciencecentral.com/profile/Arun Isn't a Logistic Distribution/Bernouli Distribution/Binomial a better approximation? That's why we used a Logistic Regression with Logit Transformation!<br /> <br /> In case, it's a Probit Regression, then probably, a Normal Distribution makes sense. Not sure though. Isn't a Logistic Distribution/Bernouli Distribution/Binomial a better approximation? That's why we used a Logistic Regression with Logit Transformation!<br /> <br /> In case, it's a Probit Regression, then probably, a Normal Distribution makes sense. Not sure though. Under some very general assum… tag:www.analyticbridge.datasciencecentral.com,2010-09-23:2004291:Comment:79133 2010-09-23T11:28:39.194Z K.Kalyanaraman https://www.analyticbridge.datasciencecentral.com/profile/KKalyanaraman Under some very general assumptions the probability may be converted into a risk score ranging from 0 to 1000. The main assumption to be considered is that the risk score(RS) is Normally Distributed. (!) Then, mean will be in the middle, say 500, and the standard deviation(sd) can be judged using a relation between sd and range, i.e. range=6sd. In this case it may be 167. Then by equating p and RS under the Normal Probability Distribution, RS could be estimated. In the place of Normal… Under some very general assumptions the probability may be converted into a risk score ranging from 0 to 1000. The main assumption to be considered is that the risk score(RS) is Normally Distributed. (!) Then, mean will be in the middle, say 500, and the standard deviation(sd) can be judged using a relation between sd and range, i.e. range=6sd. In this case it may be 167. Then by equating p and RS under the Normal Probability Distribution, RS could be estimated. In the place of Normal distribution any other distribution could be used. If log(odds) score is inaccur… tag:www.analyticbridge.datasciencecentral.com,2010-09-23:2004291:Comment:79118 2010-09-23T03:21:07.261Z Arun https://www.analyticbridge.datasciencecentral.com/profile/Arun If log(odds) score is inaccurate, then so would the probability score, and so would the RAD score as outlined above!<br /> <br /> According to my understanding, if your sample well represents the population in terms of Y's R:NR ratio, and if reqd some X's as well, then there shouldn't be a problem in log(odds) not being a biased estimate. If log(odds) score is inaccurate, then so would the probability score, and so would the RAD score as outlined above!<br /> <br /> According to my understanding, if your sample well represents the population in terms of Y's R:NR ratio, and if reqd some X's as well, then there shouldn't be a problem in log(odds) not being a biased estimate. Log(odds) is what the Logisti… tag:www.analyticbridge.datasciencecentral.com,2010-09-22:2004291:Comment:79071 2010-09-22T15:14:04.325Z Arun https://www.analyticbridge.datasciencecentral.com/profile/Arun Log(odds) is what the Logistic puts out. Why use such a complex score as above, when all you're doing is only manipulating Log(Odds) using a linear equation?<br /> <br /> And what is the exact use of PDA? Wouldn't a direct use of just the Log(odds) serve the purpose to know if we've doubled the odds or not?<br /> <br /> Finally, I quite didn't follow that part where you said, "wanted odds 50:1 at 600 points"! Also, when you say double every 20 points, does it mean that for a person A with score = Score B + PDA, he'd… Log(odds) is what the Logistic puts out. Why use such a complex score as above, when all you're doing is only manipulating Log(Odds) using a linear equation?<br /> <br /> And what is the exact use of PDA? Wouldn't a direct use of just the Log(odds) serve the purpose to know if we've doubled the odds or not?<br /> <br /> Finally, I quite didn't follow that part where you said, "wanted odds 50:1 at 600 points"! Also, when you say double every 20 points, does it mean that for a person A with score = Score B + PDA, he'd have double the odds of being a Y=1?? Can you elaborate. The technique that I have fol… tag:www.analyticbridge.datasciencecentral.com,2010-09-22:2004291:Comment:79066 2010-09-22T05:11:16.149Z Subhadip Roy https://www.analyticbridge.datasciencecentral.com/profile/SubhadipRoy The technique that I have followed is explained below;<br /> Where the scorecard is being developed using specified odds at a score and specified “points to double the odds” ( pdo), the factor and offset can easily be calculated by using the following simultaneous equations:<br /> Score = Offset + Factor ∗ ln (odds) Score + pdo = Offset + Factor ∗ ln (2 ∗ odds)<br /> Solving the equations above for pdo, we get<br /> pdo = Factor ∗ ln (2), therefore Factor = pdo / ln (2);Offset = Score {Factor ∗ ln (Odds)}<br /> <br /> For… The technique that I have followed is explained below;<br /> Where the scorecard is being developed using specified odds at a score and specified “points to double the odds” ( pdo), the factor and offset can easily be calculated by using the following simultaneous equations:<br /> Score = Offset + Factor ∗ ln (odds) Score + pdo = Offset + Factor ∗ ln (2 ∗ odds)<br /> Solving the equations above for pdo, we get<br /> pdo = Factor ∗ ln (2), therefore Factor = pdo / ln (2);Offset = Score {Factor ∗ ln (Odds)}<br /> <br /> For example, if a scorecard were being scaled where the user wanted odds of 50:1 at 600 points and wanted the odds to double every 20 points (i.e., pdo = 20), the factor and offset would be:<br /> Factor = 20 / ln (2) = 28.8539 Offset = 600 – {28.8539 ln (50)} = 487.123<br /> <br /> And each score corresponding to each set of odds (or each attribute) can be calculated as:<br /> <br /> Score = 487.123 + 28.8539 ln (odds) Subhadip, Can you enumerate o… tag:www.analyticbridge.datasciencecentral.com,2010-09-21:2004291:Comment:79027 2010-09-21T18:28:56.750Z Arun https://www.analyticbridge.datasciencecentral.com/profile/Arun Subhadip, Can you enumerate on the RAD Scores??<br /> <br /> Tom, as my understanding goes, Probability is a very tricky concept, and interpreting that is something we might want to think twice about. If I'm modeling a guest response of 5% in a database of 1MM population, would you expect probability scores of 70%+ for responders, and 30%- for non-responders? Expect we would, but I've hardly ever seen that kind of a scenario.<br /> <br /> Most situations, the probability doesn't even make sense, especially when we say… Subhadip, Can you enumerate on the RAD Scores??<br /> <br /> Tom, as my understanding goes, Probability is a very tricky concept, and interpreting that is something we might want to think twice about. If I'm modeling a guest response of 5% in a database of 1MM population, would you expect probability scores of 70%+ for responders, and 30%- for non-responders? Expect we would, but I've hardly ever seen that kind of a scenario.<br /> <br /> Most situations, the probability doesn't even make sense, especially when we say that he/she is a responder with a Probability score of 0.3 (or sometimes lower)! That's worse than if I had to pick random?! Then why model? That's another problem altogether.<br /> <br /> A score, on the other hand, masks this concept, and thus tells us, who's a more suitable target customer! Precisely what we would want, rather than a questionable probability score.<br /> <br /> Another way around this, just decile rank your scores/probability scores. The top deciles are bound to have them all, all you'd want to target I'd think.<br /> <br /> Just to finally add one more thing -<br /> Even if the model did tell you that a customer is 0.9% likely to respond upon being contacted, the customer may end up not responding. That's captured in the 0.1% part of probability, also termed error! Thus, we may not be able to predict is Tom or Vincent would respond, rather, we'd be able to predict that if we contacted N such people, we'd have atleast X% of people responding - X being the Average Probability score of the group/decile/segment. Hi I am not sure why you wan… tag:www.analyticbridge.datasciencecentral.com,2010-09-21:2004291:Comment:78994 2010-09-21T14:24:22.481Z Hindol Basu https://www.analyticbridge.datasciencecentral.com/profile/HindolBasu Hi<br /> <br /> I am not sure why you want to do this in some cases we do it for system issues, common understanding, aligning with various other scores both internal and external etc.<br /> <br /> I guess the best way to do it is to convert the probability estimate to a specific odds scheme for example a good to bad odds of 20 to 1 at 500 doubling every 20/40 points. That way you will have the advantage of converting the probability to a scale of your choice (0-1000 in your case) while retaining the advantage of… Hi<br /> <br /> I am not sure why you want to do this in some cases we do it for system issues, common understanding, aligning with various other scores both internal and external etc.<br /> <br /> I guess the best way to do it is to convert the probability estimate to a specific odds scheme for example a good to bad odds of 20 to 1 at 500 doubling every 20/40 points. That way you will have the advantage of converting the probability to a scale of your choice (0-1000 in your case) while retaining the advantage of converting the number to a desired probability of default. Most commercial scores like FICO follow a similar approach. In case this is soemthing you are interested in there are a few different methods of doing this and we could discuss in detail.<br /> <br /> Regards<br /> Hindol Hi Tom ,It is an operational… tag:www.analyticbridge.datasciencecentral.com,2010-09-21:2004291:Comment:78971 2010-09-21T10:59:06.689Z Subhadip Roy https://www.analyticbridge.datasciencecentral.com/profile/SubhadipRoy Hi Tom ,It is an operational decision based on considerations such as:<br /> • Implementability of the scorecard into application processing<br /> software. Certain software can only implement scorecards in the<br /> format<br /> • Ease of understanding by staff (e.g., discrete numbers are easier to<br /> work with).Suppose if I give a probability in an outcome of an exam in school then it will be difficult<br /> to understand so we will assign score.<br /> • Continuity with existing scorecards or other scorecards in the<br /> company. This… Hi Tom ,It is an operational decision based on considerations such as:<br /> • Implementability of the scorecard into application processing<br /> software. Certain software can only implement scorecards in the<br /> format<br /> • Ease of understanding by staff (e.g., discrete numbers are easier to<br /> work with).Suppose if I give a probability in an outcome of an exam in school then it will be difficult<br /> to understand so we will assign score.<br /> • Continuity with existing scorecards or other scorecards in the<br /> company. This avoids retraining on scorecard usage and interpretation<br /> of scores. Have you thought of using nai… tag:www.analyticbridge.datasciencecentral.com,2010-09-17:2004291:Comment:78709 2010-09-17T16:29:00.028Z Vincent Granville https://www.analyticbridge.datasciencecentral.com/profile/VincentGranville Have you thought of using naive Bayes? I am assuming that you have several metrics (independent variables) in your risk model, and that the response is "probability of default". You could also use my technique called <i>Hidden Decision Trees</i>, it is rather easy to implement, even in Base SAS. I hope to make it available for free, online (SaaS) when I find the time. Have you thought of using naive Bayes? I am assuming that you have several metrics (independent variables) in your risk model, and that the response is "probability of default". You could also use my technique called <i>Hidden Decision Trees</i>, it is rather easy to implement, even in Base SAS. I hope to make it available for free, online (SaaS) when I find the time.