A Data Science Central Community
I want to perform Clustering on categorical variables (from survey data, with around 2000 observations).
I guess PROC CLUSTER is appropriate for that. (if we create dummy variables..)
Can you please confirm on this and suggest how to go about using this?
Which method to use? (Wards?)
And how do we reduce the number of variables in this process?
How do we calculate distance between the clusters in this case?
How do we score the new dataset?
Thanks in advance.
It would be better to use some type of latent class analysis if you are using categorical variables. SAS has an LCA proc or there are several packages available like LatentGold.