Subscribe to DSC Newsletter

Hello All,
I am a masters student and have decided to do a project on customer segmentation in SAS. The project is to identify the best type of customer companies which
might be interested in buying products from a cosmetic manufacturing company and the
type of deciding variables are the 1)Age of the potential buyer company 2) The turnover 3) Revenues 4) Whether it is a cosmetic company or not
These are just a few of the variables among the many others which is in the list.
I had these doubts here however:
1) How do we interpret the clusters once it comes in the form of a tree or otherwise?

2) How do we set these criteria in the programming part while working on SAS...I believe it is proc cluster or fastclus to be used but these simply give an output based on the Euclidean distance and give an output which has similar observations in the same cluster. However, if we were to set a specific "Criteria Variables" as mentioned( the above 4 variables) , how do we do the same in SAS...
Is there any other special procedure, statement or function used in SAS other than the proc fastclus/cluster
I tried to research a lot about this on the internet but did not get much idea and neither any sample codes
I would really appreciate if I could get some inputs regarding this.
Thanking You,
With Sincere Regards,

Views: 341

Reply to This

Replies to This Discussion

hi tina,

i replied on the KDnuggets forum but i guess you are not checking that thread anymore....

1. you will find lots of documentation on deciding the number of clusters but as far as interpretation is concerned, it's pretty much subjective just like the interpretaion of a factor analysis result. a lot will depend on the specific business/domain for which you are analyzing the data, and the business requirement of your client.

2. you can use PROC CLUSTER. specify the variables you want with VAR, and if you want to use a non-Euclidean distance for clustering, you can compute a distance matrix using PROC DISTANCE. you'll find details on the SAS online help doc.

i also guess your dataset will have mixed variable types. there are a lot of opinions on which clustering method should be used for such data but i find SPSS's two-step cluster the best as far as i know.

you can find a lot of documentation about this online. The following link gives detailed description about PROC CLUSTER in SAS.

Hope this helps.
I agree with Romakanta regarding the SPSS's two-steps cluster method to be the best as far as i know. I've done a segmentation project for one of Israel airlines and got good results using the two-steps method as it can handle both nominal and numerical variables. The lesson I've learned is to aware to the business goal and to strive answer it using the least amount of variables. If you would like i can describe the airline project in more details but i advise you to read about SPSS's two-steps cluster method first.

Hi, I'm also doing a research about consumer segmentation. In this case consumers buying products or service online.
I considered seven variables (after doing a factor analysis) and in conjoint with socio-economic variables, conform the dependent variables.

I used CHAID for the segmentation and I found four segments of customers each with their own decision rules.
Actually I'm developing an Agent Based Simulation to model online consumer behavior. Specifically creating agents representing the four segments founded.
Hi Tina,

a small remark:
If you are looking for the group of companies with the highest potential to buy you should go for a propensity model...

Proc Varclus seems to be the best method for customer segmentation if you are analyzing data at product level.

Interpreting the tree is based on the connecting nodes (clusters). Which are all the products highly associated (within and b/w clusters).

Many Thanks,

I agree with all on using SPSS's 2-step procedure which works great for continuous and categorical data. I used it to Segment member bases for AOL and Allstate and the algorithms ran in under 15 minutes and interpretation was straight forward and passed face validity.

Isaac Turk
what does one do when there are almost 130 different product categories--example - i have SAS i see the objective is to cluster customers who buy similar products. what is the best way to deal with missing values?

customer id product1 product2 product3 ------ product130
111 1 . 1 .
123 . . . .
222 . . 1 .
999 . . . 1


On Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service