Subscribe to DSC Newsletter

Computation of Weight of Evidence when either the number of bads or goods in a class of a variable is 0

Hi All!

I want to understand the ways in which Weight of Evidence (WoE) is computed or adjusted in the following scenarios:

 

1. When number of goods in a class of a variable is 0

2. When number of bads in a class of a variable is 0

 

WoE = ln(distribution of goods/distributions of bads)

 

Scenario 1: WoE=ln(0) ?? when number of goods in a class =0.

Scenario 2: WoE=ln(distribution of goods/0)=ln(infinity) ?? when number of bads in a class = 0.

 

 

Regards,

Sharath

 

Views: 1656

Reply to This

Replies to This Discussion

Exactly as you've written - it's undefined for some categories.

Such categories can't be used by logistic regression as well.

You have several options:

- discard attributes having such categories

- merge categories so none of them will have 0 goods/bads

- if it's really significant rule, then exclude records in this category from training sample. You already know that P(good)=0% or 100%; why would you want to train a model for this? Build only for the rest.

I have tried to use the WOE = ln(bad_distribution/good_distribution)

when the age variable
age band bads goods
19-25 2388 2019 8
26-30 1920 1716 24
31-35 1399 1377 53
36-40 1097 1157 73
41-45 934 1126 113
46-50 628 948 180
>50 527 876 209


The score comes good and increasing trend,
but
when I am using WOE = ln(good_distribution/bad_distribution) it is coming in decreasing trend.

My doubt is it correct to use ln(bad_distribution/good_distribution)


I can see from Mr.Branko Mlikota's posting as ln(non events/ events)......

Please any one respond

RSS

On Data Science Central

© 2019   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service