Subscribe to DSC Newsletter


I am working as a marketing statistician at an online retailer company. We use logistic regression to build response model to decide which customers are worth mailing our catalog. We ususally have somewhere between 600,000 customers with 300 variables. So far, it works out fine compared to the RFM method.

Can anyone suggest other algorithm(s) that you think might be better than logistic regression under some situations? Thank you so much for your reply.

Views: 293

Replies to This Discussion

Hi, Timothy:
Thanks. I used clustering to group the variables and select one variable from each group to achieve reduction on the number of variables. I will certainly try PLS_DA. How do you actually do that in SAS? What procedure do I have to use to do that?
Hi, Timothy:
Thanks. Do you know any papers or introduction to PLS_DA ? This is actually the first time I heard about it. Also, I haven't used R before even though I used S-Plus quite a few years back. Thanks.

I'm concerned that your selection of one variable from each group is losing useful information.
In addition to Tim's excellent suggestion, I would recommend that you try CART or C5.0_with_boosting, and combine the results of the decision tree with Logistic Regression in a confidence-based voting scheme. You could also use the "best" clustering solution as new variables into C5 and LogReg. That might really improve your results.

kind regards,
Ben Dickman
Central Connecticut State Univ.
Hi, Ben:
Thank you very much for your inputs. Our company only has SAS/STAT which does not do the CART or C5.0_with_boosting. What can I do then? Do I have to use open source package like R? Thanks.

So you only have the SAS statistics, but no Data Mining software?
R and WEKA are both extensible, and I recommend that you become
familiar with both. But WEKA is much easier to get started with.

is the website. I also bought their book "Data Mining - Practical Machine Learning Tools and Techniques, 2nd edition" by Ian H. Witten & Eibe Frank. I recommend it highly, especially if you like the tool.

kind regards,
Ben Dickman
Central Connecticut State Univ.
Partial Least Squares is available in SAS Enterprise Miner and so is Principal Component analysis you can
use to reduce correlated predictor variables.

As far as recommendations in SAS/STAT without using Enterprise Miner maybe you can try PROC ROBUSTREG for interval scaled response variables. It's basically an extension of PROC REG made to deal with outliers which you probably have a lot of given the size of your database.

You can also try running survival analysis. SAS has quite a few procedures available and the one I would probably say I like most is Cox Proportional Hazards model you can run using PROC PHREG. Unlike other survival procedures, it is semi-parametric, meaning it has no distributional assumptions. The downside of it is that I have yet to see the code needed to score it. It still may be useful to run it in order to examine the estimates since it has a lot of advanced features you can't do in Logistic Regression such as accounting for late entry into the risk set (i.e. left truncation) and time dependent covariates. I recommend a book written by Paul Allison if you're interested in this.

If you have missing data, SAS has an excellent Multiple Imputation procedure using PROC MI.
Agin, Paul Alison has written an excellent monograph on it.

Other than that I would recommend Neural Networks and Decision Trees but I haven't tried running these outside of Enterprise Miner so I'm not sure if you can run them directly in SAS/STAT.
Dear Mr.Tsai,
Try to use the Decision Tress (CART).
Can you share me your data to have alook insude them?

All my best regards
I can't share the company's data with you. However, thank you for sharing the idea with me.


On Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service