Subscribe to DSC Newsletter

Information

Data Mining

Architecture, algorithms, statistical techniques, real life problems, real time, distributed systems, software. Emphasis on very large data sets.

Members: 392
Latest Activity: Jan 10

Analytic News

Discussion Forum

what programming language to pick up

Started by Yi-Chun Tsai. Last reply by Jonathan Seller Nov 18, 2011. 15 Replies

any way of getting SAS Enterprise Miner

Started by Yi-Chun Tsai. Last reply by Vishal Lala Jan 10. 12 Replies

plotting of graph in R

Started by jadelim. Last reply by jadelim Jul 30, 2012. 11 Replies

Modeling when only event data is available

Started by Rahul V. Last reply by Steffen Springer Nov 26, 2009. 11 Replies

what other data mining techniques can I use

Started by Yi-Chun Tsai. Last reply by Yi-Chun Tsai Oct 30, 2009. 10 Replies

Initialization methods for k-means clustering.

Started by DataMiner. Last reply by YONGYANG HUO Feb 9, 2011. 8 Replies

Decreasing Dataset Dimensionality

Started by Paul Wilson. Last reply by Ralph Winters Jun 17, 2010. 7 Replies

Data Mining Graduate Certificate Options

Started by Sandra Donlon. Last reply by Tom Wolfer Sep 15, 2010. 7 Replies

Different approaches to simple counting question

Started by Vincent Granville. Last reply by Emory Creel Aug 18, 2008. 7 Replies

Transductive SVM for semi supervised learning

Started by Paul Wilson. Last reply by Paul Wilson Jun 22, 2010. 6 Replies

Lift Chart

Started by Paul Wilson. Last reply by Paul Wilson Feb 1, 2010. 5 Replies

How to get R started

Started by jadelim. Last reply by jadelim Jul 10, 2012. 4 Replies

Good R square FMCG industry

Started by Mindy Scott. Last reply by Jarkko Venna Feb 7, 2012. 3 Replies

Data mining in human resources

Started by yaser yadekar. Last reply by Zakaria Y. AL-Jammal Nov 9, 2009. 3 Replies

Comparison of predective Data Mining Tools

Started by KHELOUFI Tarik. Last reply by Jozo Kovac Feb 1, 2011. 2 Replies

Click Fraud Problem

Started by Vincent Granville. Last reply by Vincent Granville Oct 6, 2008. 2 Replies

Comment Wall

Comment by Nishant Modi on October 19, 2010 at 3:40am
Hi,
Currently, I am working on a predictive model in Data Mining using SAS EM. I am bit stuck at Attribute selection part. My data set has 86 attributes including 1 target attribute. I have used the Variable Selection node in SAS EM and it did provide me some set of influencing attribute set as results but I am not so sure about it.
I would like to know if there are other methods through which I can go for attribute selection in predictive modeling preferably using SAS although I am ready to learn other approaches. And also, if there are ways to verify the attribute set, that I got from Variable Selection node in SAS EM.

Thanks
Nishant

Cheers :))
Comment by Yi-Chun Tsai on October 19, 2010 at 5:05am
Hi, Nishant:
I don't use SAS EM since we don't have it here. Instead, I use SAS/STAT and particularly for variables selection, I used PROC VARCLUS to first cluster my whole set of hundreds of variables into clusters. Then, I would pick one variable from each cluster which makes sense to either me or my manager. This approach works for me since I don't have to rely on some automatic process that may give me some recommendation that may not make business sense. Also, I can avoid multicollinearity by just picking one variable from each cluster. By the nature of clustering, variables within the same cluster are close to each other while variables between different clusters are relatively distinct from each other. Also, avoid stepwise procedure at all cost. It is not recommended by most experts.
Comment by Nishant Modi on October 19, 2010 at 5:30am
Thanks Yi-Chun for your suggestions.
Sorry, but just want to be sure if I have fully understood your point. So you use clustering to get a few clusters of attributes.
And then from each cluster you pick out attribute keeping in mind aspects like multicollinearity, confounding etc and the finally you get a set of attributes with which you would go ahead to model.

Thanks
Nishant

Cheers :))
Comment by Yi-Chun Tsai on October 19, 2010 at 5:35am
Hi, Nishant:
Yes, that was what I meant.
Comment by Nishant Modi on October 19, 2010 at 5:47am
Thanks Yi-Chun, your suggestions are helpful.
Comment by Yi-Chun Tsai on October 19, 2010 at 5:51am
Hi, Nishant:
You are welcome. I hope it helps. By the way, if you can suggest, what programming language would you suggest to me if I want to code data mining algorithms by myself instead of using out of the shelf packages such as SAS EM or SPSS Clementine? C++ or Java or Python? Thanks.
Comment by Nishant Modi on October 19, 2010 at 6:30am
Well, it would essentially depend on your programming background. But I would suggest you may go with Java because of 2 reasons 1) You will not have to go through hassles like pointers in C++ and many other issues from which you would like to stay away 2) Java has very large set of APIs with relevant documentation with many operations be it mathematical, data structures etc. already implemented. So you will not have to start things from scratch and would be mainly focused on your Data Mining part.

Also, the Data Mining APIs were going to be made available with JDM (Java Data Mining) APIs but I am not sure if they are out or not.

Hope this helps.
Comment by Yi-Chun Tsai on October 19, 2010 at 7:02am
Thanks for sharing this. I will check it out. I guess the next question is whether it is better to code your own data mining algorithms or it's better to use what is available either in commercial market like SAS/SPSS or open ware like R/RapidMiner. What do you think?
Comment by Nishant Modi on October 19, 2010 at 9:21am
That would depend on the situation and what you want to do. If one wants to really make some intense changes or tweak in the algos then writing your own code is more favorable, what I mean if you are more into research kind of environment. But if just getting the results is your daily job then going with the versions provided by different vendors would be the preferred option.
Comment by Vera Klimkovsky on September 29, 2011 at 5:55pm

Join us for the FREE ACM Data Mining Camp on October 15 at eBay San Jose.

Learn more, watch the video

http://www.youtube.com/watch?v=aEcW9qwdopw

Comment

You need to be a member of Data Mining to add comments!

 

Members (390)

 
 
 

Follow Us

On Data Science Central

On DataViz

On Hadoop

© 2017   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Terms of Service