Subscribe to DSC Newsletter

Hi - has anyone worked on clustering project using some non numeric variables? For e.g. clustering customer behavior based on brand preference, type of product purchase etc? I only have SAS EG available with me and couldn't think of a way to do it as yet...

any help would be great!

Views: 19461

Reply to This

Replies to This Discussion

In the past I've used matching coefficients, multiple correspondence analysis followed by k-means and "canonical cluster analysis", which uses optimal scaling as the first step. Nowadays I leans towards latent class.

you can use proc distance algorithm in SAS 9.2 version only..

i assume all your vars are symmetric nominal so you can use Matching distance (DMATCH),if it is asymmetric nominal then Jaccard distance.

the code is :

PROC DISTANCE DATA= Library.Dataset OUT=Library.Dataset_out method=Dmatch;
ID id;

here the id(respondent id) should be character.

then clustering code:

Proc cluster data=Library.Dataset_out method=ward pseudo outtree=Library.Dataset_out_tree;
id id;

let me know if you have any doubts

Hi Vinay,

I also doing similar kind of clustering...but the problem Iam facing is that I have around 500K rows. Proc cluster cannot handle such large datasets and another question is how to interpret the clusters from the output??

Vinay, you should dump out the cluster ID by record and then run a series of design crosstabs using the Seg ID's as the columns and the rows are based on records you care about.... That is the only way...
hi sir
im doing phd in document clustering. can u give me some ideas reg that
Hi, I realize this thread has been running for some time, but I am new to ABridge. I have uses Multiple Correspondence Analysis and then Hierarchical clustering on the dimensions scores using SPAD software, as mentioned earlier by some one else. In SAS it looks like you have to recode driver variables into binaries before you run PROC CORRES....anyone have any experience here that you can provide guidance on data prep prior to this proceedure?
Tom, I have had the same situation when using Proc LCA for latent class analysis. What I usually do is run a data step before, and manually recode into binaries. If you know SAS macros, that helps.

-Ralph Winters
Thx Ralph. I am new to SAS, so I will look into this.


On Data Science Central

© 2020   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service