Subscribe to DSC Newsletter

Hi everyone !, greetings from southamerica !, at the moment I`m working with 70 variables, (weather), there is multicolinearity in many of them....I d like to know if there is a way of creating 1 variable (from every 12) that resume the info contained in the rest. I´ve done PCA, Cluster, GWR (in GIS) I`ve tried with glm (SAS) and Stepwise (forward and backward) and there is no way I can find relevant variables over dependant variable.
Thanks a lot !

Views: 225

Comment

You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

Comment by Mauricio G. on November 9, 2012 at 8:51am
Thank you very much Lynne, dependant variable is "site related clonal behaviour" (in terms of performance, i.e. clonal deviation) will try your suggestion.
Comment by Lynne Mysliwiec on November 1, 2012 at 9:23am

Are you doing time series modeling or cross-sectional modeling?  What's the dependent variable? 

If you are doing time series analysis (example: using weather over time to predict today's crop yield), then you want to create lagged variables and aggregate variables first (# of sunny days in last x days, # of days with rain in the last x days). 

After constructing aggregates that make good business sense, you can then use factor analysis to collapse variables that are related into factors.  Review your factors to see which attributes "load" on each and revise/trim the dataset as needed & rerun until you are happy with your factors.

On Data Science Central

© 2020   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service