A Data Science Central Community
I am trying to perform clustering on my customer files with about 80K customers and 50 variables.
Instead of using either just hierarchical or non-hierarchical methods in SAS, I first tried to determine the "OPTIMAL" number of clusters and their seeds using PROC CLUSTER.
Next, I will feed this information/seeds into PROC FASTCLUS to refine the clusters. This was the recommendation that someone gave to me: use hierarchical method first to get the seeds and feed the seeds to non-hierarchical methods to fine tune the clusters.
However, it took forever for PROC CLUSTER to even create clusters for my 80K customers. I had to abandoned it before it returned any result.
Can anyone suggest a way to deal with big data set like mine? Thanks.
I need some clarification. I know that clustering can be used with binary transformation using distance matric but can fastclust be used in the same fashion. Please let me know your thoughts on this.