Subscribe to DSC Newsletter

Hi guys,

I want to perform Clustering on categorical variables (from survey data, with around 2000 observations).

I guess PROC CLUSTER is appropriate for that. (if we create dummy variables..)

Can you please confirm on this and suggest how to go about using this?

Which method to use? (Wards?)

And how do we reduce the number of variables in this process?

How do we calculate distance between the clusters in this case? 

How do we score the new dataset?

Thanks in advance.

Views: 811

Reply to This

Replies to This Discussion

It would be better to use some type of latent class analysis if you are using categorical variables. SAS has an LCA proc or there are several packages available like LatentGold.


On Data Science Central

© 2020   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service