Subscribe to DSC Newsletter

This discussion has been recovered from our archives. 

I'm new to predictive modelling and I'am currently developing a model of student churn for an educative institution where I work. I´m using logistic regression for this issue , so which technique should I use in order to detect outliers in my training set?.


  1. The way we take care of outliers in Logistic Regression is creating dummy variables based on EDA (Exploratory Data Analysis).
  2. Regression analysis, the available "DRS" Software
  3. You brought a good question for discussion. We use Half-Normal Probability Plot of the deviance residuals with a Simulated envelope to detect outliers in binary logistic regression. The plot helps to identify the deviance residuals. A good reference is a book authored by Cook, R.d and S. Weisberg, titled Applied Regression Including Computing and Graphics (1999). For reference how to do half-normal plot with envelop check
  4. we normally screen out the most extreme 2 percentile of any variable(total of 4pct). those records that have the extreme variable got removed. u can reduce the cutoff to 1pct if yr sample size is small

Views: 1314

Reply to This

On Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service