This discussion has been recovered from our archives.
I'm new to predictive modelling and I'am currently developing a model of student churn for an educative institution where I work. I´m using logistic regression for this issue , so which technique should I use in order to detect outliers in my training set?.
- The way we take care of outliers in Logistic Regression is creating dummy variables based on EDA (Exploratory Data Analysis).
- Regression analysis, the available "DRS" Software
- You brought a good question for discussion. We use Half-Normal Probability Plot of the deviance residuals with a Simulated envelope to detect outliers in binary logistic regression. The plot helps to identify the deviance residuals. A good reference is a book authored by Cook, R.d and S. Weisberg, titled Applied Regression Including Computing and Graphics (1999). For reference how to do half-normal plot with envelop check https://cran.r-project.org/web/packages/auditor/vignettes/model_fit...
- we normally screen out the most extreme 2 percentile of any variable(total of 4pct). those records that have the extreme variable got removed. u can reduce the cutoff to 1pct if yr sample size is small