A Data Science Central Community
Has anyone used or have thoughts on using a 2-step hurdle model to address the imbalance of "GOODS" vs "BADS" often present in a sample of borrowers?
That is, first run a logistic regression on your Good vs Bad, then take all of your Bads and use the % paid on the loan as the dependent variable and run a separate linear regression. In the Linear, those who defaulted after 3 months would fair worst that those who nearly paid off completely.
Finally combine outcomes of the two models to create a scorecard.