A Data Science Central Community
OK so I've spent hours trying to maximize the R2 of a multiple variable model and the R2 is still low (ex. <50%). Should I become discouraged and put the model aside; I used expensive professional statistical analysis software.
When creating a comprehensive multiple variable model with dependant and independent variables, a user usually has the upper hand in understanding the fundamentals (ex. physics, chemistry, finance etc...). However, the following must also be considered: the quality of the data, the number of data points, the number of independent variables, the variation of the variables, data filtering etc...
For example, was a delay introduced into the independent variables, is the model linear and the independent variables are a non-linear function of the dependant variable etc...
If it is assumed that the independent and dependant variables are being measured with an adequate amount of accuracy and precision. If the R2 is still low after optimizing the model, then this may be due to the fact that an important independent variable is not being measured or is not being used in the model. This is not bad; in fact it may lead to a break-through or increase our understanding of that which is being modeled.
Even a model that has a low R2 tells us something important!