A Data Science Central Community
Hi All!,
I was going through the book on "Predictive Modeling with SAS Enterprise Miner" by Kattamuri Sarma. In the chapter on Variable selection the author suggests the following approach for initial variable screening in the scenario of a binary target variable with numeric interval-scaled inputs:
The target variable is treated like a continuous variable and the R square with the target is computed for each original input.........
I want to know if calculating correlation between a discrete and continuous variable is technically correct.
Thanks.
Regards,
Sharath
Tags:
There are many different kinds of correlation. The correlation of which you speak is referred to as biserial correlation, which refers to the association between a binary and continuous variable. There is also an alternative variable selection method in SAS Enterprise Miner that you can use which constructs a CHAID type decision tree, and uses a chi-square test statistic.
-Ralph Winters
Ralph for the prompt responses.I have learnt quite a few things from you in the last few days.Thanks for your valuable inputs and am expecting to learn lot more from you and the others on this forum.
Thanks once again.
Regards,
Sharath
Just wondering, in SAS Enterprise Miner there are also another variable selection node called "Variable Selection". You have the option to screen variables uninvariately using R-square association between continuous or nominal independent variables and the binary target.
How does this R-square approach compare to the Karl Pearson's correlation r? Which one is better?
© 2021 TechTarget, Inc. Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles