Umm...Really embarassed about this...but there was a stupid mistake in my code.... Now its giving me great CCC values (1600! ummm?)..so I guess things are great now...<br />
im using the number of clusters where CCC is high and PSF peaks...hope thats ok... too scared to ask any more questions :) ..thanks guys
Sorry for not replying earlier...But just wanted to thank you all for your help. The methodology I am following now is given below, but the main problem is that im getting large negative CCC values (-200) so I'm kinda worried. Dont want to strech my luck but any luck here would help too :).<br />
- Taken data with 4 attributes ( var1(integral from 1-6),var2(integral from 0-145), var3(integral from 0-45), var4 (the DOW variable which has basically been reduce to a weekend stay indicator)…
Hi, what i would recommend to do is to convert them into 7-dim variables: say<br />
(n1,n2,n3,n4,n5,n6,n7)<br />
the clustering will then be in 7-dim space
Even though dates have an interval scale, you are "forcing" the DOW to be a numeric variable, when it's not, and you may end up with bad results, or having to explain why there is a difference when there is not. For example, Which is the lowest number Monday or Sunday?<br />
You are better off doing hierarchical clustering with this rather than performing what looks like k-means clustering. Hierarchical clustering can handle both categorical and numeric variables.<br />
<br />
-Ralph Winters
dear aditya,<br />
i am glad you like solutions.<br />
1) there are more *clus procedures in sas, you can explore.<br />
2) right, u can use only numbers - "The VAR statement lists the numeric variables to be used in the cluster analysis... " (SAS doc)<br />
so yes, 3 dummy variables should be created:<br />
if day in (sun, sat) then wend=1 else wend=0;<br />
if day in (mon, tue, wed) then bow=1 else bow=0;<br />
if day in (thu, fri) then eow=1 else eow=0;<br />
<br />
... distance-based clustering algorithms are very senstitive to training data…
First of all, thanks a lot Dirk and Jozo. I didnt expect such fast (and useful) replies. Couple of points:<br />
<br />
- Is fastclus the only way to go? (I guess given that I have data in the 100,000 rows range, thats a yes)<br />
- If I use fastclus- can I use categorical variables? (Jozo- I liked your solution [3 periods] but can i use it with fastclus?) ( what I'm thinking is that maybe I can add 3 columns - weekend_ind, bow_ind and eow_ind which use 0/1..so a weeked dow will be (1,0,0) for weekend_ind,…
simply decode days to 3 periods:<br />
* weekend (sat-sun)<br />
* begin_of_week (mon-wedn)<br />
* end_of_week (thu-fri)<br />
distance between all of them is 1 and that makes common sense :)
