Significance of variables - AnalyticBridge2020-06-01T05:49:04Zhttps://www.analyticbridge.datasciencecentral.com/forum/topics/significance-of-variables?feed=yes&xn_auth=noWhat type of data you have?
-…tag:www.analyticbridge.datasciencecentral.com,2015-02-23:2004291:Comment:3203262015-02-23T21:20:49.845ZKonstantinos Chlouverakishttps://www.analyticbridge.datasciencecentral.com/profile/KonstantinosChlouverakis
<p>What type of data you have?</p>
<p>- Continuous Y and categorical X? t-test</p>
<p>- Categorical Y and categorical X? chi-square</p>
<p>and then include in the model the variables (Xs) that are significant (p-value<0.05 or 0.1)</p>
<p></p>
<p>What type of data you have?</p>
<p>- Continuous Y and categorical X? t-test</p>
<p>- Categorical Y and categorical X? chi-square</p>
<p>and then include in the model the variables (Xs) that are significant (p-value<0.05 or 0.1)</p>
<p></p> Dear Sunil,
Good evening. I'd…tag:www.analyticbridge.datasciencecentral.com,2015-02-20:2004291:Comment:3203032015-02-20T13:07:44.560ZSagar Diwakar Uparkarhttps://www.analyticbridge.datasciencecentral.com/profile/SagarDiwakarUparkar
Dear Sunil,<br />
Good evening. I'd seen this post little late. In my point of view if you haven't yet finalised which predictive you're going to use then best way is check correlation among variables. Avoid those variables which do have multi co-linearity among themselves. I hope this will sort your issue.<br />
Regards,<br />
Sagar.
Dear Sunil,<br />
Good evening. I'd seen this post little late. In my point of view if you haven't yet finalised which predictive you're going to use then best way is check correlation among variables. Avoid those variables which do have multi co-linearity among themselves. I hope this will sort your issue.<br />
Regards,<br />
Sagar. but its very time consuming t…tag:www.analyticbridge.datasciencecentral.com,2015-02-19:2004291:Comment:3202292015-02-19T12:49:45.785ZThomas Lincolnhttps://www.analyticbridge.datasciencecentral.com/profile/ThomasLincoln
<p>but its very time consuming to do it for all 65 variables.</p>
<p></p>
<p>Sunil,</p>
<p>Don't use the excuse of time for shortcuts as you don't know what learning you will miss. Data experience comes from touching / exploring all the data at as low a level as possible to learn and understand how the variables relate and interact with each other. . . and then to the dependent variable. That is how data intuition is built and how data business experience is built.</p>
<p>but its very time consuming to do it for all 65 variables.</p>
<p></p>
<p>Sunil,</p>
<p>Don't use the excuse of time for shortcuts as you don't know what learning you will miss. Data experience comes from touching / exploring all the data at as low a level as possible to learn and understand how the variables relate and interact with each other. . . and then to the dependent variable. That is how data intuition is built and how data business experience is built.</p> DO varclus work for binary d…tag:www.analyticbridge.datasciencecentral.com,2014-07-20:2004291:Comment:3016162014-07-20T11:10:43.429ZSarbanihttps://www.analyticbridge.datasciencecentral.com/profile/Sarbani
<p>DO varclus work for binary data as well?I mean is all predictors are binary and I want to group variables.</p>
<p>Selection of variable from each cluster is basis which criterion?</p>
<p>DO varclus work for binary data as well?I mean is all predictors are binary and I want to group variables.</p>
<p>Selection of variable from each cluster is basis which criterion?</p> Thanks for all suggestions..tag:www.analyticbridge.datasciencecentral.com,2010-03-10:2004291:Comment:627362010-03-10T07:35:39.644ZSunil A Nhttps://www.analyticbridge.datasciencecentral.com/profile/SunilAN
Thanks for all suggestions..
Thanks for all suggestions.. If we're talking about a line…tag:www.analyticbridge.datasciencecentral.com,2010-03-09:2004291:Comment:626412010-03-09T07:29:28.884ZTejamoy Ghoshhttps://www.analyticbridge.datasciencecentral.com/profile/TejamoyGhosh
If we're talking about a linear regression here, a correlation analysis (between the dv and all candidate IVs) is usually done as a general practice. This will not reveal much about any non-linear relationships though - for that a scatterplot analysis is usually helpful. So it can be done this way:<br />
1. Run a correlation analysis - pick the IVs showing significant high correlation with the DV<br />
2. Run a scatterplot for the ones that weren't significant in step-1, to check if there is any visible…
If we're talking about a linear regression here, a correlation analysis (between the dv and all candidate IVs) is usually done as a general practice. This will not reveal much about any non-linear relationships though - for that a scatterplot analysis is usually helpful. So it can be done this way:<br />
1. Run a correlation analysis - pick the IVs showing significant high correlation with the DV<br />
2. Run a scatterplot for the ones that weren't significant in step-1, to check if there is any visible non-linear relationship between the DV and the IV in question.<br />
<br />
Does this make sense? I like to categorize the vari…tag:www.analyticbridge.datasciencecentral.com,2010-02-27:2004291:Comment:617732010-02-27T21:14:59.820ZRalph Wintershttps://www.analyticbridge.datasciencecentral.com/profile/RalphWinters
I like to categorize the variables and run chi-square tests on them first independently. If they pass, then I will look further by adding a moderating variable.<br />
<br />
-Ralph Winters
I like to categorize the variables and run chi-square tests on them first independently. If they pass, then I will look further by adding a moderating variable.<br />
<br />
-Ralph Winters Hi Sunil
STATISTICA Data Mine…tag:www.analyticbridge.datasciencecentral.com,2010-02-10:2004291:Comment:601702010-02-10T09:41:45.688ZKrishnendu Kunduhttps://www.analyticbridge.datasciencecentral.com/profile/KrishnenduKundu337
Hi Sunil<br />
STATISTICA Data Miner has a very good option which may solve your problem.<br />
The option is called Feature selection condition where it uses Chi-square and F-test for Categorical and continuous depend variable respectively, after the analysis it will show you the important variables for most significant one into the decending order.<br />
this will be very effective specially when you have large number of predictors.
Hi Sunil<br />
STATISTICA Data Miner has a very good option which may solve your problem.<br />
The option is called Feature selection condition where it uses Chi-square and F-test for Categorical and continuous depend variable respectively, after the analysis it will show you the important variables for most significant one into the decending order.<br />
this will be very effective specially when you have large number of predictors. Hi All,
In my limited experi…tag:www.analyticbridge.datasciencecentral.com,2010-02-05:2004291:Comment:599982010-02-05T07:55:49.092ZHindol Basuhttps://www.analyticbridge.datasciencecentral.com/profile/HindolBasu
Hi All,<br />
<br />
In my limited experice I have found that Proc Varclus is a good method of finding the key variable clusters and then one can pick a few variable from each cluster. However sometime we see that most of the variables get bunched up in a single cluster and then this method becomes less effective. In that case ordering those variables that are in the biggest cluster by information value or R square or any other similar metric could be used to select the most predictive variables from the…
Hi All,<br />
<br />
In my limited experice I have found that Proc Varclus is a good method of finding the key variable clusters and then one can pick a few variable from each cluster. However sometime we see that most of the variables get bunched up in a single cluster and then this method becomes less effective. In that case ordering those variables that are in the biggest cluster by information value or R square or any other similar metric could be used to select the most predictive variables from the largest cluster.<br />
<br />
Also another limitation of using varclus is that it does not refer to the predictive power of the variabels so when one picks the top few contributing variables from each cluster one is not sure if one is picking up the most predictive varibale. Hence an IV or R squaere type mesrue might be looked in conjunction with the varclus results to pick the most predictive variables.<br />
<br />
It will be great if we can get some more opinion on this. One tool I use is for variabl…tag:www.analyticbridge.datasciencecentral.com,2010-02-04:2004291:Comment:599912010-02-04T20:32:01.286ZJames D. Walkerhttps://www.analyticbridge.datasciencecentral.com/profile/JamesDWalker
One tool I use is for variable reduction variable clustering (e.g. Proc Varclus) and selecting one variable from each cluster. Then test for significance.
One tool I use is for variable reduction variable clustering (e.g. Proc Varclus) and selecting one variable from each cluster. Then test for significance.