Subscribe to DSC Newsletter

Explaining variability in logistic regression

Hi All,

I have built a logistic regression but I am not able to figure out which goodness of fit measure will help me know 'How much variability of the dependent variable is being explained by the current model'

I have calculated few pseudo r2 to measure this but as we know pseudo r2 is not a good way of measuring.

In addition to this can any one tell me how to calculate McKelvey & Zavoina's pseudo r2

Thanks

Jai

Views: 5391

Comment

You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

Comment by Arun on August 23, 2010 at 10:26am
Hi,

Let me try to explain what you're asking for, and why you can't get it.

In logistic regression, a measure of 'how much variation has been explained' by the independent variables is not applicable! Any such measure would be misleading given the nature of logistic regression - dichotomous.
A pseudo-Rsquare is also a measure of the Deviance of the model from actual, just as AIC or SC is!

Try reading up more on why there is no error term in a logistic regression, and you'll end up understanding why you can't measure a goodness of fit by knowing variance explained as in linear regression.
Remember, in Linear Reg, variance is a constant, while in Logistic Reg, it a function of the probability function you're modeling - a variable variance... can you see why there's a problem understanding the variance explained now??

Hope this helps.

Thanks,
Arun
Comment by Jai Shanker Singh on August 6, 2010 at 12:20am
Thanks Biswajit for your comments

Hosmer and Lemeshow Goodness of Fit statistic is more useful in assessing the significance of the Logistic Regression than telling us about how much variability of the dependent variable is being explained by the independent variables like R2 in Linear Regression.

What I am looking for is a number which would tell us how much of the variability of the dependent variable is being explained by the independent variables and how much is not

Thanks
Comment by Biswajit Pal on August 3, 2010 at 12:35pm
Hi
You can use Hosmer and Lemeshow Goodness of Fit statistic in order to measure the discriminating power of the model. It tests whether the predicted and observed values for the dependent variable are same or different. In SAS the option “LACKFIT” in the model statement generates this.
Another method is representation in a confusion matrix which leads to ROC Curve.
Please let me know whether it provided you any relevant insight or not.
Thanks
Biswajit
Comment by Ralph Winters on August 2, 2010 at 2:49pm
As part of the output you will get a predicted probability of being in the class designated by 0. You also have the original classes of 0 or 1. So take the original classes for each observations as the x values (0,1,0,0,1 etc.) and run a linear regression against the predicted values (.03, .22, .98, .21 etc.) and use the r2 of the result.

-Ralph Winters
Comment by Jai Shanker Singh on August 2, 2010 at 2:02pm
Hi,

I am running a logistic regression in SAS

Ralph,

Thanks for your comments but can you please elaborate what you have said
Comment by Ralph Winters on August 2, 2010 at 11:56am
You did not say what package you are running. For a quick linear regression type of R2 you can take all of the predicted values and regress them against the observed 0 and 1's.

-Ralph Winters

On Data Science Central

© 2019   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service