A Data Science Central Community
I was going through the book on "Predictive Modeling with SAS Enterprise Miner" by Kattamuri Sarma. In the chapter on Variable selection the author suggests the following approach for initial variable screening in the scenario of a binary target variable with numeric interval-scaled inputs:
The target variable is treated like a continuous variable and the R square with the target is computed for each original input.........
I want to know if calculating correlation between a discrete and continuous variable is technically correct.
There are many different kinds of correlation. The correlation of which you speak is referred to as biserial correlation, which refers to the association between a binary and continuous variable. There is also an alternative variable selection method in SAS Enterprise Miner that you can use which constructs a CHAID type decision tree, and uses a chi-square test statistic.
Ralph for the prompt responses.I have learnt quite a few things from you in the last few days.Thanks for your valuable inputs and am expecting to learn lot more from you and the others on this forum.
Thanks once again.
Just wondering, in SAS Enterprise Miner there are also another variable selection node called "Variable Selection". You have the option to screen variables uninvariately using R-square association between continuous or nominal independent variables and the binary target.
How does this R-square approach compare to the Karl Pearson's correlation r? Which one is better?