# AnalyticBridge

A Data Science Central Community

# Explaining variability in logistic regression

Hi All,

I have built a logistic regression but I am not able to figure out which goodness of fit measure will help me know 'How much variability of the dependent variable is being explained by the current model'

I have calculated few pseudo r2 to measure this but as we know pseudo r2 is not a good way of measuring.

In addition to this can any one tell me how to calculate McKelvey & Zavoina's pseudo r2

Thanks

Jai

Views: 5669

Comment

Join AnalyticBridge

Comment by Arun on August 23, 2010 at 10:26am
Hi,

Let me try to explain what you're asking for, and why you can't get it.

In logistic regression, a measure of 'how much variation has been explained' by the independent variables is not applicable! Any such measure would be misleading given the nature of logistic regression - dichotomous.
A pseudo-Rsquare is also a measure of the Deviance of the model from actual, just as AIC or SC is!

Try reading up more on why there is no error term in a logistic regression, and you'll end up understanding why you can't measure a goodness of fit by knowing variance explained as in linear regression.
Remember, in Linear Reg, variance is a constant, while in Logistic Reg, it a function of the probability function you're modeling - a variable variance... can you see why there's a problem understanding the variance explained now??

Hope this helps.

Thanks,
Arun
Comment by Jai Shanker Singh on August 6, 2010 at 12:20am

Hosmer and Lemeshow Goodness of Fit statistic is more useful in assessing the significance of the Logistic Regression than telling us about how much variability of the dependent variable is being explained by the independent variables like R2 in Linear Regression.

What I am looking for is a number which would tell us how much of the variability of the dependent variable is being explained by the independent variables and how much is not

Thanks
Comment by Biswajit Pal on August 3, 2010 at 12:35pm
Hi
You can use Hosmer and Lemeshow Goodness of Fit statistic in order to measure the discriminating power of the model. It tests whether the predicted and observed values for the dependent variable are same or different. In SAS the option “LACKFIT” in the model statement generates this.
Another method is representation in a confusion matrix which leads to ROC Curve.
Please let me know whether it provided you any relevant insight or not.
Thanks
Biswajit
Comment by Ralph Winters on August 2, 2010 at 2:49pm
As part of the output you will get a predicted probability of being in the class designated by 0. You also have the original classes of 0 or 1. So take the original classes for each observations as the x values (0,1,0,0,1 etc.) and run a linear regression against the predicted values (.03, .22, .98, .21 etc.) and use the r2 of the result.

-Ralph Winters
Comment by Jai Shanker Singh on August 2, 2010 at 2:02pm
Hi,

I am running a logistic regression in SAS

Ralph,