A Data Science Central Community
Hello fellows
I'm working on a project with a very large database,and i have to to predict sales (time series) for each custommer (about 200 000) for each product (about 500), based on historical sales data as well as some kind of auxiliary variables/series , lets call them events, wich can be categoriaca/flag and continous too.
Because it's a huge work, and it will be very time consummer, i thought to start by creating clusters of custommers, for each product, based on the scructure/trend of the sales series as well as of other explanatory series, and then estimate a diferent model for each custer/product.
If i found 200 clusters i could estimade 2000 models instead of the the original 200.000 for each product.
The obvious strategy is to estimate a diferent model for each custommer/product, but it's very time consumer.
I need to know if it's possible to find the best model that, in average, best fits the sales series for each cluster/product, without estimating each individual model (very time consumming).
Is it possible? With good(at least acceptable) results? Any suggestion?
Tanks in advance.
Tags: Multivariate, clustering, database, large, series, time
Yes, if you are using SAS you can model this using a routine such as STATESPACE. I believe R can also use Kalman filters.
-Ralph Winters
Tanks Ralph.
I'll use IBM/SPSS PASW Statistics and/or Modeler.
© 2019 AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles