Subscribe to DSC Newsletter

 Need advice Dear Community,

I have a situation, where I need to classify items into groups (lets say 6). When I ran k-means 90% of my data fall in 1 group remaining 10% fall in other groups. What's next step? In order to further group the data, I have taken the 90% data group and once again I ran k-means.This time I have 15 new groups within this new dataset. But now again 76% fell in one group remaining in 14 groups? How to deal in such situation?

Tags: clustering

Views: 653

Reply to This

Replies to This Discussion

Hi Suresh, have you derived any general statistics on your data?  It sounds like the means kurtosis distribution is really high.  That could be the correct result...  How many variables are you using?  Are they independent variables?  I think all the variables in a cluster analysis are supposed to be fairly independent.  You can run a correlation test to find out.  Good luck.

Think about the data that you are trying to cluster with. How many dimensions are you using? Are the variables highly related? DO the variables have different standard deviations? What is the distribution? 

For instance, if your data is log-normal then a lot of the cases will be in the low end of the distribution with a few at the high end. If you have a bunch of highly correlated log-normal variables, that could get the kind of results you are seeing.

Clustering is often treated as a garbage-disposal method; toss anything in and it gets crunched. I find that one has to put a lot of thought into the variables used to get meaningful results.


On Data Science Central

© 2020 is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service