# AnalyticBridge

A Data Science Central Community

# how to deal with high collinearity in predictors in time series data

Hi, all:

Just as the title said, I need your inputs on ways to deal with high collinearity among my potential predictor in my time series regresssion model.  Any idea is greatly appreciated!  Thanks.

Views: 1884

### Replies to This Discussion

Hi, it is me again. I forgot to mention that one of my friends used ridge regression to deal with this problem in time series data. I never heard of people using ridge regression in this context. Any idea why/why not to use it?
First thought: do you need them all? If they're truly collinear and naturally vary somewhat together, then almost by definition one of the collinear group of factors might (might) well fill in for all of them. You're reducing dimensionality, making it simpler.

Issue here is which one to use. You can 1) use the one that makes the most walking-around sense, 2) use the one that gives you best adjusted R-squared; 3) use the one that is least collinear with your other factors. Also, if the collinearities aren't too bad (VIFs <10 -- ugh -- certainly less than 4 is fine), you can keep the collinear factors in the model.

If the factors are only collinear because of the design of the experiment or because the data are happenstance and not because the factors are truly collinear, then you have to do some futzing and make some assumptions. You can still cull them down to one or two factors, but you obviously need to be clear to your customers that the data won't let you separate factors efficiently and that any or all of the unincluded factors, possibly not the one you chose, may really be what is driving the responses.

Another method to handle happenstance data and create orthogonal factors for use in regression is principal components analysis. It's reasonably easy to apply, but figuring out what principal components mean in real terms can be a serious issue. Seriously collinear factors can also give this method fits for a number of reasons.

Important to know why your factors are correlated, obviously. Talk to your data and to other subject matter experts about this. You talked about "potential predictors" - if that means you still can design the experiment, then you really want to make your factors (predictors) orthogonal using DOE. But I'm sure you've thought about that -

Not familiar with using ridge regression in time series data, so another poster will have to comment on that. I try to stay away from ridge regression, but it works fine on plain old regression data.

Delve well --