A Data Science Central Community
I'm working on a project with a very large database,and i have to to predict sales (time series) for each custommer (about 200 000) for each product (about 500), based on historical sales data as well as some kind of auxiliary variables/series , lets call them events, wich can be categoriaca/flag and continous too.
Because it's a huge work, and it will be very time consummer, i thought to start by creating clusters of custommers, for each product, based on the scructure/trend of the sales series as well as of other explanatory series, and then estimate a diferent model for each custer/product.
If i found 200 clusters i could estimade 2000 models instead of the the original 200.000 for each product.
The obvious strategy is to estimate a diferent model for each custommer/product, but it's very time consumer.
I need to know if it's possible to find the best model that, in average, best fits the sales series for each cluster/product, without estimating each individual model (very time consumming).
Is it possible? With good(at least acceptable) results? Any suggestion?
Tanks in advance.
Yes, if you are using SAS you can model this using a routine such as STATESPACE. I believe R can also use Kalman filters.
I'll use IBM/SPSS PASW Statistics and/or Modeler.