Subscribe to DSC Newsletter

This discussion has been recovered from our archives. 

I'm new to predictive modelling and I'am currently developing a model of student churn for an educative institution where I work. I´m using logistic regression for this issue , so which technique should I use in order to detect outliers in my training set?.

Answers:

  1. The way we take care of outliers in Logistic Regression is creating dummy variables based on EDA (Exploratory Data Analysis).
  2. Regression analysis, the available "DRS" Software
  3. You brought a good question for discussion. We use Half-Normal Probability Plot of the deviance residuals with a Simulated envelope to detect outliers in binary logistic regression. The plot helps to identify the deviance residuals. A good reference is a book authored by Cook, R.d and S. Weisberg, titled Applied Regression Including Computing and Graphics (1999). For reference how to do half-normal plot with envelop check https://cran.r-project.org/web/packages/auditor/vignettes/model_fit...
  4. we normally screen out the most extreme 2 percentile of any variable(total of 4pct). those records that have the extreme variable got removed. u can reduce the cutoff to 1pct if yr sample size is small

Views: 396

Reply to This

On Data Science Central

© 2019   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service