A Data Science Central Community
Hello! I am in the process of running many multiple regression models in order to explain the variance in the Volume sold of specific SKUs. The main IVs I am using are price and price gap. My question is, how do I know when I have a high enough R square?...one that will be accurate when I use the model?
I am getting average R squares around .50, but is that enough? At the moment I don´t have many other variables that I can add to improve the model fit, so your help/advice is appreciated!
P.S. I am using SPSS to run the regressions, so if there is a way I can improve my models, please let me know.
I would say that and R square of 50 would indicate that the covariates are related to the dependent variable, but this does not say that the model is accurate enoguh for you situation.
There is no general way of saying when the model is accurate enough. That depends on the use of the model. In some cases having a low R square is enough and in others you might want a higher one. You just need to ask your self if the inaccuracy is something that you can live with or does it cause too many problems.
If you are doing predictions based on the model you should also try to figure out the prediction accuracy of the model as this would be a more meaningfull measure of performace when doing predictions.
To measure the prediction accuracy of a model you should leave out some of the data when teaching the model and then test how well the model predicts the left out data points.
Thanks Jarkko! I am using this for prediction purposes, so I will test for prediction accuracy. On another note, i have come across multicollinearity, and I think it may be making my model inaccurate. What would you suggest as far as dealing with independent variables that are correlated? Could I just make separate models splitting up the correlated variables? Thanks!!
sometimes it is not possible to avoid the multicollinearity. Especially if the correleated variables are categorical. One thing that you could try is to transform the correltated variables using for example PCA to form new variables that are not correlted. You could also try different models that use different variables and select the one that has be best prediction pover.