<p>What type of data you have?</p>
<p>- Continuous Y and categorical X? t-test</p>
<p>- Categorical Y and categorical X? chi-square</p>
<p>and then include in the model the variables (Xs) that are significant (p-value<0.05 or 0.1)</p>
Dear Sunil,<br />
Good evening. I'd seen this post little late. In my point of view if you haven't yet finalised which predictive you're going to use then best way is check correlation among variables. Avoid those variables which do have multi co-linearity among themselves. I hope this will sort your issue.<br />
Regards,<br />
Sagar.
<p>but its very time consuming to do it for all 65 variables.</p>
<p>Sunil,</p>
<p>Don't use the excuse of time for shortcuts as you don't know what learning you will miss. Data experience comes from touching / exploring all the data at as low a level as possible to learn and understand how the variables relate and interact with each other. . . and then to the dependent variable. That is how data intuition is built and how data business experience is built.</p>
<p>DO varclus work for binary data as well?I mean is all predictors are binary and I want to group variables.</p>
<p>Selection of variable from each cluster is basis which criterion?</p>
Thanks for all suggestions..
If we're talking about a linear regression here, a correlation analysis (between the dv and all candidate IVs) is usually done as a general practice. This will not reveal much about any non-linear relationships though - for that a scatterplot analysis is usually helpful. So it can be done this way:<br />
1. Run a correlation analysis - pick the IVs showing significant high correlation with the DV<br />
2. Run a scatterplot for the ones that weren't significant in step-1, to check if there is any visible…
I like to categorize the variables and run chi-square tests on them first independently. If they pass, then I will look further by adding a moderating variable.<br />
<br />
-Ralph Winters
Hi Sunil<br />
STATISTICA Data Miner has a very good option which may solve your problem.<br />
The option is called Feature selection condition where it uses Chi-square and F-test for Categorical and continuous depend variable respectively, after the analysis it will show you the important variables for most significant one into the decending order.<br />
this will be very effective specially when you have large number of predictors.
Hi All,<br />
<br />
In my limited experice I have found that Proc Varclus is a good method of finding the key variable clusters and then one can pick a few variable from each cluster. However sometime we see that most of the variables get bunched up in a single cluster and then this method becomes less effective. In that case ordering those variables that are in the biggest cluster by information value or R square or any other similar metric could be used to select the most predictive variables from the…
One tool I use is for variable reduction variable clustering (e.g. Proc Varclus) and selecting one variable from each cluster. Then test for significance.
