Subscribe to DSC Newsletter

Hello fellows

I'm working on a project with a  very large database,and i have to to predict sales (time series) for each custommer (about 200 000) for each product (about 500), based on historical sales data as well as some kind of auxiliary variables/series , lets call them events, wich can be categoriaca/flag and continous too.

Because it's a huge work, and it will be very time consummer, i  thought to start by creating clusters of custommers, for each product, based on the scructure/trend of the sales series as well as of other explanatory series, and then estimate a diferent model for each custer/product.

If i found 200 clusters i could estimade  2000 models instead of the the original 200.000 for each product.


The obvious strategy is to estimate a diferent model for each custommer/product, but it's very time consumer.

I need to know if it's possible to find the best model that, in average, best fits the sales series for each cluster/product, without estimating each individual model (very time consumming).

Is it possible? With good(at least acceptable) results? Any suggestion?

Tanks in advance.

Tags: Multivariate, clustering, database, large, series, time

Views: 176

Reply to This

Replies to This Discussion

Yes, if you are using SAS you can model this using a routine such as STATESPACE.  I believe R can also use Kalman filters.


-Ralph Winters



Tanks Ralph.


I'll use IBM/SPSS PASW Statistics and/or Modeler.


On Data Science Central

© 2020   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service