Subscribe to DSC Newsletter


I am new to analytics and would like to understand how can we approach a problem statement where we have to cluster around 10,000 retail stores.

We would want to cluster the stores for Macro as well as Micro Space Optimization.

The information that we have at hand are Sales and Inventory.

Please advise. 

Tags: Analytics, Clustering, Data, Macro, Micro, Optimization, Retail, Space

Views: 4041

Reply to This

Replies to This Discussion



You will have to include external variables in addition to Sales and Inventory for clustering the stores.


Few important variables are as follows:

1. Internal variables

Sales of store, Inventory of store, Area of the store, Sales per bill, no. of footfalls (if available), Premium store or not, superstore or not, product categories kept (indicator variables -e.g men clothing, men and women clothing, kids store,etc. ), growth expectation, sales growth expected, category section indicators/buckets, Age group, % of office goer indicator, family sales indicator, discounts and other indicators from marketing team can be of use.


2. External variables

Location of retail store state, city, location of store - distance from city centre, no. of shops in vicinity (existing in the mall), type of clientele footfall (premium, middle, below middleclass - buckets), distance of stores from nearby same brand store, no. of same brand stores within 50 miles, etc.


Other than this, understanding the Operations can be of much use. Meeting with Sales & Marketing teams can help.

Use any clustering tool like SAS, Minitab, SPSS, CART, etc. I think this is more of a profiling exercise.


This reply is personal and should notbe taken on behalf of the company I am working for. Tks!

Thanks Rishi!

Are there any clustering algorithms that are used to perform clustering of stores using the variables mentioned by you?

I really appreciate your response.

Hey check varclus proc of SAS. It has all the algos which can be selected from for clustering!

I was writing about this just last week on my own blog (  As I wrote there,  I believe that geo/demographic data may not be useful to you at the outset for space optimization; it will probably cloud the problem. 

If you want to cluster for the purposes of space-optimization (as per your original question) then what you really care about is was sells well (and so needs more space) and what sells less well (that you can take space away from).  With one important caveat this is all to be had from your historical sales data.  The caveat is that your historical sales data cannot tell you about categories you have never sold in a store so if you are planning to introduce lots of new product groups you will have to get to predicting their sales rates from causal factors sooner rather than later.  Chances are this will not be a problem.

If your 10,000 stores include stores with fundamentally different offerings I would start by spitting them out into a small number of independent groups and cluster on each separately.  Then build sales profiles to cluster on from your historical sale data.   Here's a made up example for a prepared foods category:

It may take you a few iterations to decide what is important in terms of determining space requirements and you will need a solid DSR to both manage the necessary master data and do the data transformations.  If its necessary you can combine attributes before generating profiles, for example, build combinations of cuisine and size as new attributes, but be wary as this can create a lot of attributes for you sales profiles very quickly.  Adding more attributes, especially  if they are not relevant to  the problem you are working on will make it harder to find and interpret sensible clusters.

As far as algorithms go, any general purpose stats package will include at least 1 clustering algorithm and if your data is presented well any of them should work for this.  As far as software, this is far from an exclusive list:

  • SAS is good (though pricey if you do not already have it),
  • Statistica likewise,
  • R is impressive, free and will work though with a learning curve
  • you can get Add-ins for Excel that will handle 10,000 rows of data just fine.  
  • I've used the clustering tool from SQL*Servers data mining add-in for Excel to good effect too.  

Honestly I think the algorithm or specific implementation of it is much less important than carefully choosing, cleaning and preparing the data to cluster on.

Once clusters are built I would then pull in geo/demographic data and use it to characterize or describe each cluster.  In this way you can get your first look as to which of the hundreds of demographic data fields available are actually relevant to describing (and potentially predicting) the thing you actually care about - sales. 




Hi Karen,

One of the options here is to distinguish stores which yield more profit (more sales) from those that yield less profit.

A root cause analysis can be performed using a simple CART model which tells us the factors / parameters having higher impact on sales.

If you would like more guidance on clustering, StatSoft, Inc.provides many webcasts and tutorials here:

Hopefully this helps!


On Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service