Subscribe to DSC Newsletter

Coarse classing and fine classing/Observation and Performance Window

Hi All!,

 

Can someone explain what the difference between fine classing and coarse classing is in the context of logistic regression?

 

Also, how do we fix the observation and performance windows to tag a binary target variable?Is this based on the volume of data (in terms of number of months) available or are there pre-determined industry standards for different types of models built for different purposes?

 

Thanks.

 

Regards,

Sharath

Views: 10105

Reply to This

Replies to This Discussion

Hi,

it's well described in book: The credit scoring toolkit: theory and practice for retail credit risk ... p 361-366

Check link: http://books.google.sk/books?id=7LlGfPvOJLoC&pg=PA366&lpg=P...

 

Simply: Create initial classes (discretization/fine classing) , compute WoE, join neighbours with similar WoE (coarse classing), create dummy variables from final classes, stepwise select the best dummies.

 

Observation and performance windows depend on task.

E.g. if you predict PTB for credit cards, performance may be 1-3 months, observation 1-12 months. 

If you predict credit risk, performance is usually 12 months (Basel2), observation 1-12 months.

If you don't have history long enough, use smaller windows. 

Hi Jozo!,

Thanks for helping me understanding the difference.It is nice to know that I work for the same company as you do though in a different country :).

 

I would also appreciate if you could let me know as to how one goes about deciding on an optimum sample size before embarking on the analysis.For eg. lets say I am planning to build a credit risk scorecard using logistic regression on a database of 1 million customers and the  bad <default> rate is 8%.I decide to build the model not on the entire population (i.e. 1 million in this case) but only take a sample and then further split this into 70% development and 30% validation.How do I fix the optimum sample size? I mean how will I know what size of the sample is good enough to come up with a good or a "champion" model? Is this an iterative process where we take different sample sizes and compare the models?Could you advise me on this?

Thanks in advance. :)

 

Look - if you can, use all data. That's best way how to avoid problems and create reliable model.
If there are performance issuess, you can use stratified sample for feature selection.
E.g. - take all bad (8%) and add random 16% of good. So you'll have dataset with 33% of bad and 66% of good. 
Find you predictors (LR - stepwise selection). Make fine-tunning, coarse classing, dummy variables, etc. 
Then take all data (8% bad, 92% good), insert it into LR without selection of atributes - and let algoritm to estimate you parameters. It's not time-consuming, will be done in couple of minutes.
I haven't mentioned TRAIN/TEST sets - don't forget to use them everytime (50% test / 50% train).
Good luck.
Jozo
PS: Find me on Sametime, will help you anytime.
Hi Jozo!,
As always - thanks for ur response.I shall add u on Sametime :).
Could you pls elaborate on the bolded part of your explanation below?I did not completely understand the following part.
Find you predictors (LR - stepwise selection). Make fine-tunning, coarse classing, dummy variables, etc. 
Then take all data (8% bad, 92% good), insert it into LR without selection of atributes - and let algoritm to estimate you parameters. It's not time-consuming, will be done in couple of minutes.
Reply when you find time.
Regards,
Sharath
Sametime will be much better way.

RSS

On Data Science Central

© 2019   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service