Comments - What do you do with a multiple variable model that has a low correlation coefficient (R2)? - AnalyticBridge2021-06-24T06:05:12Zhttps://www.analyticbridge.datasciencecentral.com/profiles/comment/feed?attachedTo=2004291%3ABlogPost%3A77954&xn_auth=noLooking at the residuals afte…tag:www.analyticbridge.datasciencecentral.com,2010-09-23:2004291:Comment:791092010-09-23T01:34:54.096ZRalph Wintershttps://www.analyticbridge.datasciencecentral.com/profile/RalphWinters
Looking at the residuals after regression will also tell you a lot about where you stand. If the residuals are normal, I would suggest start looking to supplement your model with other unknown variables, as you yourself have suggested. Otherwise tricks like logs, Box power transforms, etc can work to smooth the data out, as suggested by Tom. But they are somewhat artificial, so I would proceed with caution.<br />
I have also often found that if you have TOO much data you end up with a reversion to…
Looking at the residuals after regression will also tell you a lot about where you stand. If the residuals are normal, I would suggest start looking to supplement your model with other unknown variables, as you yourself have suggested. Otherwise tricks like logs, Box power transforms, etc can work to smooth the data out, as suggested by Tom. But they are somewhat artificial, so I would proceed with caution.<br />
I have also often found that if you have TOO much data you end up with a reversion to the mean type problem, and again low R2. In this case you need to segment the data out as suggested by Jon.<br />
<br />
-Ralph Winters Scott that is a good point, a…tag:www.analyticbridge.datasciencecentral.com,2010-09-23:2004291:Comment:791082010-09-23T01:28:38.433ZChris Carozzahttps://www.analyticbridge.datasciencecentral.com/profile/ChrisCarozza
Scott that is a good point, a model with a high R^2 is not necessarily a measure of a "good" model. However, I've already caught myself trying to maximize the R^2 to the detriment of model integrity. This is especially true when trying to develop predictive process models.
Scott that is a good point, a model with a high R^2 is not necessarily a measure of a "good" model. However, I've already caught myself trying to maximize the R^2 to the detriment of model integrity. This is especially true when trying to develop predictive process models. A high R^2 is NOT a measure o…tag:www.analyticbridge.datasciencecentral.com,2010-09-23:2004291:Comment:791032010-09-23T00:40:25.524ZScott Nicholsonhttps://www.analyticbridge.datasciencecentral.com/profile/ScottNicholson
A high R^2 is NOT a measure of a 'good' model. Generally time series data give you a high R^2 whereas cross-sectional data will yield a low R^2. Regressing a variable on the lag of itself generally will give you a high R^2, but does that fit the definition of a 'good' model? Depends on what the goal of your model-building exercise is.<br />
<br />
If you have a low R^2 and are confident about your choice of predictors, then it's just true that there is a large amount of unobservable variation in your data.…
A high R^2 is NOT a measure of a 'good' model. Generally time series data give you a high R^2 whereas cross-sectional data will yield a low R^2. Regressing a variable on the lag of itself generally will give you a high R^2, but does that fit the definition of a 'good' model? Depends on what the goal of your model-building exercise is.<br />
<br />
If you have a low R^2 and are confident about your choice of predictors, then it's just true that there is a large amount of unobservable variation in your data. And expensive software won't solve that problem either, obviously. Another thing you can try is…tag:www.analyticbridge.datasciencecentral.com,2010-09-22:2004291:Comment:790782010-09-22T18:03:54.003ZJonathan Davishttps://www.analyticbridge.datasciencecentral.com/profile/JonathanDavis
Another thing you can try is partitioning your data--for instance if one of those independent variables has a large influence in the prediction, the value of that one independent variable may affect how other variables influence the predicted value. I've run across this effect before when examining transportation systems--the relationship between wait times and speed/capacity/number of transports was completely different under different demand circumstances. A single model trying to include the…
Another thing you can try is partitioning your data--for instance if one of those independent variables has a large influence in the prediction, the value of that one independent variable may affect how other variables influence the predicted value. I've run across this effect before when examining transportation systems--the relationship between wait times and speed/capacity/number of transports was completely different under different demand circumstances. A single model trying to include the level of demand yielded poor results--bad residual distribution, non constant variance. Multiple models of the overall system worked very well. Tom, thank-you for your feed-…tag:www.analyticbridge.datasciencecentral.com,2010-09-09:2004291:Comment:780912010-09-09T02:29:25.812ZChris Carozzahttps://www.analyticbridge.datasciencecentral.com/profile/ChrisCarozza
Tom, thank-you for your feed-back. The independent variables were not highly correlated and the outliers were removed from the indepedent and dependent variables. I like the point that you made of using a function (ex. Ln) to normnalize the data. In the end, it would appear that one of the most important independent variable was not in the model.
Tom, thank-you for your feed-back. The independent variables were not highly correlated and the outliers were removed from the indepedent and dependent variables. I like the point that you made of using a function (ex. Ln) to normnalize the data. In the end, it would appear that one of the most important independent variable was not in the model.