A Data Science Central Community
I am trying to perform clustering on my customer files with about 80K customers and 50 variables.
Instead of using either just hierarchical or non-hierarchical methods in SAS, I first tried to determine the "OPTIMAL" number of clusters and their seeds using PROC CLUSTER.
Next, I will feed this information/seeds into PROC FASTCLUS to refine the clusters. This was the recommendation that someone gave to me: use hierarchical method first to get the seeds and feed the seeds to non-hierarchical methods to fine tune the clusters.
However, it took forever for PROC CLUSTER to even create clusters for my 80K customers. I had to abandoned it before it returned any result.
Can anyone suggest a way to deal with big data set like mine? Thanks.
I just jump into the old discussion thread. Do you have any sample dataset or SAS code that is used to determine optimal number of cluster using PROC CLUSTER first and then feed the resulting seed into PROC FASTCLUS to further refine the cluster ( or may be other way around, first PROC FASTCLUS to get the seed and then use those seeds in PROC CLUSTER to refine the cluster).