Subscribe to DSC Newsletter

While carrying out logistic regression, the model with most significant variable removed from it(based on p-values)gives the highest accuracy but the improvement in residuance deviance is not better than the null model. Thus which criteria is to be followed the model with best accuracy or model which gives lowest residual deviance, i.e shoudl we remove variables if the accuracy improves or remove the variables if the residual devicnes becomes low? Any reference will be usefull.

Views: 297

Replies to This Discussion

Hi Mine the data,

can you please explain what you mean by 'residual deviance'?

Deviance from my understanding in logistic regression is the -2LogL chi sq  test right or AIC/BIC/SCORE tests.

This is used to test the hypothesis that when you removed the variable that there is a significant difference in the logit value.


Residual analysis however is used for a different purpose.

1) testing multivariate normal assumption

2) testing/observing heteroskedasticity of residuals (i.e. is there different variance for different ranges of the predicted odds)

3) get a feeling for the noise (ie mean, median, std, range etc...)


To test the quality of your classification you need other tests:

RoC area under the curve

F1 Test - comparison of specificity vs sensitivity

GINI coefficient

Hosmer & Lemeshow GoF test

KS - D statistic and GoF test



Furthermore you test the influence/leverage of outliers on  your solution using

Leverage vs predicted

Influence vs predicted

Individual beta influence against each feature


It could be however that I just don't understand what ' residual deviance' is and need enlightenment.



Follow Us

On Data Science Central

On DataViz

On Hadoop

© 2017 is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Terms of Service