# AnalyticBridge

A Data Science Central Community

# How to check linearity when the number of IVs are large (200+)

I have recently got an assignment from an FMCG company for market segmentation (clustering) using more than 200 variables. As a first step, I plan to carryout data reduction using Factor Analysis. So far I have been testing linearity of variables in SPSS using scatter plots. However the number of combinations in this case would be simply too large to carry out scatter plots. Can anyone suggest some solution to me how I can test linearity using SPSS.

Views: 986

### Replies to This Discussion

How about reducing the dimensionality using principal components? I suspect a majority of the variance in the data can be explained by a fraction of the factors.

Have you tried computing Pearson correlation coefficients for all of your pairs? It should be part of the bi-variate analysis.

Thanks Ralph however not too clear about your reply. If there is correlation amongst variables can that be taken as an indicator of linearity. I personally do not think so!

Ravi - Correlation does not necessarily mean linear correlation.  But yes, if you are using a Pearson coefficient, it is the linear part that it is measuring.  Did you have anything specific in mind?

Ralph- sorry I am coming back to you so late. However can you please give me more on correlation which is not linear.... Ravi

In a *pinch*, one quick trick is to utilize OLS. Generally, the first listed assumption of the Gauss-Markov Theorem is that Y and Xs are linearly related. Using this to your advantange, choose the continuous variables of interest and select one to be Y (the rest Xs) and then model using OLS. Therefore, if a variable is not significant, you can *assume* that it's not linearly related with Y. Then, randomly select *at least* 15-25 of the remaining, significant variables and run probablity plots to ensure linearity.

I have, for a few years been helping clients with predictive modeling using SPSS. Have also been often using Factor Analysis and Clustering. I am keen to know, can some of these tools be used on data from an e-commerce site. I am talking to a company which is successfully selling watches, jewellery, goggles, stationery etc thru an e-commerce site. What can I propose besides web analytics which they are already doing? Suggestions would help....Ravi