Subscribe to DSC Newsletter

Is it possible to carry out a cluster analysis using categorical variables ?

Views: 30386

Reply to This

Replies to This Discussion

What about using decision trees?
Mirko, I am trying to conduct subjective segmentation which is non - induced or simply I do not have a "Y" or objective function.
Tom, I have a couple of variables both numeric and categoric variables I would like to conduct a profiling on my data.
Thanks for the reply...
Hi Bhaswati,
Please select some of variables which u fell is important for cluster analysis. Once u decide upon no of cluster then merger cluster result with original data set and do profiling for rest of unused variables . Yes you can find good articles on latent class analysis using SAS EG. Just do some Google search. If you are planning to use R then there are lot of packages available which can be used for latent class analysis.
That was quite informative.
Are you suggesting that i use these dummies or use this woe variables with the other continuous variables in the equation?
I will be using proc fastclus in SAS, so should i have all these variables & dummies together in the "Var" statement ?
One method of showing riskiness is to evaluate the odds ratio or log odds of the category.
Hi Bhaswati,
I had similar kind of data in one of my clustering project. I used expected maximization clustering techniques which is based on prior probability distribution and likelihood based algo. In fact you can also do Latent class analysis for such mixed type of data. I will recommend for expected maximization algo for the clustering.
I do not know whether this is possible with SAS EG i used open source tool weka to do this analysis.
I hope this will help you
I have not worked on cluster analysis before.
Do you have any write ups on Latent class analysis? I am afraid I am not aware of ho wto approach.
Tom, I am not sure about "Latent Class" analysis.
I wouldn't recommend recoding categorical variables into numerics. I would stick with decision trees, correspondence analysis, or latent class analysis. You cannot do latent class analysis in SAS using EG, but there is a PROC LCA which will do the trick.

-Ralph Winters
Could you please elaborate the reason for not converting categorical variables to numeric ?
Does Proc LCA work in SAS EG ?
Also if you have write ups on Latent Class analysis it would be really great!
Main reason is that nominal categorical variables do not have order. for others, you are assigning them arbitrarily. The dummy variable technique is fine for regression where the effects are additive, but am not sure how I would interpret them in a cluster analysis with multi levels. Maybe adding with 1 binary variable would be OK.

Haven't tried Proc LCA in SAS EG, but it might work in the code node.

-Ralph Winters


On Data Science Central

© 2020   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service