Subscribe to DSC Newsletter

Multinomial Logistic Regression Predicting Cluster Membership


I recently developed a cross sell application that took product purchase history and flagged the record with a 1 for purchased, and 0 if not, within orders. 

I then used Multinomial Logistic Regression to assign new orders to the cluster. 

I found that at a certain point, the classification did not work with too many clusters, but was flawless with fewer clusters @ 100% correct classification. 

The goal was to produce cross sell tables and rates according to the cluster membership as opposed to one single table. 

This may seem like a lot of effort but then I put the whole thing into a spreadsheet application that call center reps could use while talking with customers who were buying. 

The reason I did this, was because they had not solutions for this and none are in sight. 

Anyway, what could I have done better in this situation?



Views: 4698

Reply to This

Replies to This Discussion

If you could tell me what a cross sell application does, I could share thoughts on it using theory of Multinomial Logit or other decision based categorical models. At first sight kindly ensure u r not violating the IIA assumption : that a phenyl bottle with a green label can be bought regardless of if another bottle with a red label has in bought. Another problem is this I find or re-reading. Log-log models(multi-nomial) logit are found inconsistent while dealing with data residuals which different distributions. In your case, the choice if toothpastes, vaccines and laptop are all included, and you are trying to find out basically different residual patterns even robust adjustment cant solve much. Try to follow methods that give a set of shapes or slopes, and then try to predict a class of products matching the spending propensity. I know this is too vague, and I do not know what cross selling is But, this is certain, if your results are significantly different from ols eg wrong signs, then the Model(logistic) is WRONG. mail me @[email protected] Your work seems interesting to me. The reason I am taking too much of Interest in all this is ven I am new to this field and need to pick up skills fast. 

Cross selling is the technique of making someone an offer based upon what they have already intended to purchase. Example: Given products A+B+C at checkout what is the best product D to offer. Thanks for your question.

This technique addresses several issues such as the adaptive probability of the best product to offer as a person places more items in the cart, rather than just offering a "customers who bought x also bought y solution" Please message me for details if this sounds like something you would like to offer a prospective employer.

Essentially, you have multiple dichotomies predicting multiple clusters. It does not always work and the final solution is unique to the data set. If you can somehow slap two models into a spreadsheet with a little VBA into the mix, it is a very low cost approach that looks pretty good with a little tweaking. It is a great way to please call center execs who have budget cuts year over year.


I wonder how you fitted the Multinomial Logistic Regression to Dichotomous dependent variable? Anyway I would suggest fitting Simple Logist Models for Binary dependent variables. Multinomial Logistic models are for dependent variables with more than two categories in response. Is this the case with your purchase orders? I do not see such from your discussion. Anyway you would find much details in the Online Course on Logistic Regression Models on

There was more than one cluster. 9 clusters classified using multinomial etc. Contact me if you need some learning resources on cluster analysis, there can be more than two clusters. I will check out your link. Thanks.

Okay, Glade to discuss and learn Cluster Analysis. But I ought to suggest you can even fit a Logistic Regression Models where the Response Variable is Binary and across cluster. You can fit the Binary Logistic Regression Models and accommodate the Cluster by setting the VCE for Cluster. Do you think the case is not what I assume. I am aware of Stata's routine of logit where VCE can be set to cluster by an option after the main command line, vce(cluster clustvarid)

Thank you. Muhammad, I will follow up on that. Regards, Sean.

Hi Sean,


Could you please point some elarning resources on cluster analysis.



Hi Sean. My concern here is that you have over-fitted the data or have perfect separation. 100% correct classification is a big red flag that your model has one of these problems. Are you getting very large coefficients in your model which is also a sign of this. Even if not I would be worried.

It means that the data used to build the model will be fit perfectly by the model but that new cases might be predicted very badly. For any model with say 50 cases and 50 predictors you will get 100% fit. Separation occurs when a predictor (or linear combination of many predictors) always yields one category for the dependent variable as a result of sparsity in the data. See

You really ought to crossvalidate your model. You can either hold some sample back (for testing) or use a more sophisticated method like Correlated Component Regression designed for high dimensional logistic problems (number of predictors approaching or exceeding the number of cases) which does the crossvalidation as part of the model selection process. See

I agree with some qualifications. New examples not in the data set are estimated, fail. Win? It is the same as using a single chart for cross selling so that based on history , it is more of an adaptive historical reporting technique. For business users who dive into the database to compare historical order patterns to cluster membership it may raise some red flags if they are not 100% classified.

My major concerns are that new patterns may not be appropriately classified, as a trade off for accuracy when it is adaptive reporting of historical observations.

How do we manage the trade offs with a bold, new approach?Is there another approach,regardless of clusters and multinomial models?
When we see Amazon collaborative filtering, do they worry about over fitting ?I would be interested in finding out.
Thanks for your post.

Hi Sean

Overfitting is a very serious problem in any type of modelling. Going for 100% in your training data is the wrong goal and is just fitting irrelevant noise in the sample. I can guarantee the more you over fit, the worse the model will predict new cases. You need an algorithm that maximises fit to the validation data not the training data. Very simple models will work much better on the validation data (new cases)than very complex ones. You could try the 30 day trial version of correlated component regression (logistic variant) and see how well this works compared with your existing model. Good Luck.

Gary, I forgot to mention in my reply that your approach seems very exciting. It is as if the logitresearch algorithm was tailor made for this specific type of application. The interface is pretty slick too. Thanks again for your help. That did achieve one objective of the post! The remaining objective is for someone to say use another completely different method altogether.



Bayesian Network should be a good alternative.


On Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service