A Data Science Central Community
This is a function of the number of variables in your model. For example if you have 25 variables in your model, as a rule of thumb, you will need a minimum of 25*10 / .08 sample size (3125). Then you need to scale up to accomodate your 70%/30% validation criteria.
Thanks for your response.I have a few queries based on your reply.I would appreciate if you resolve those queries.
1. When you say 25 variables in the model-Do you mean the 25 raw variables in the dataset available to me initially?
2. Could you explain the formula/function that you have mentioned? Precisely, how do we get the values of 10 and 0.8?
In your reply I see the word 'minimum', but I would like to know the 'optimum' sample size instead.
by variables, I mean main effects in the model. There is a paper by Peduzzi that discusses this in which he shows than 10 times the number of parameters / the least likely outcome (in your case .08 churn) yields a proper number. However, I'm not sure what you mean by "optimum" sample size. This will always be dependent upon the number of variables in the model. If you end up throwing out variables for whatever reason, it will change.