A Data Science Central Community
Tags:
This is a function of the number of variables in your model. For example if you have 25 variables in your model, as a rule of thumb, you will need a minimum of 25*10 / .08 sample size (3125). Then you need to scale up to accomodate your 70%/30% validation criteria.
-Ralph Winters
Ralph,
Thanks for your response.I have a few queries based on your reply.I would appreciate if you resolve those queries.
1. When you say 25 variables in the model-Do you mean the 25 raw variables in the dataset available to me initially?
2. Could you explain the formula/function that you have mentioned? Precisely, how do we get the values of 10 and 0.8?
In your reply I see the word 'minimum', but I would like to know the 'optimum' sample size instead.
Regards,
Sharath
by variables, I mean main effects in the model. There is a paper by Peduzzi that discusses this in which he shows than 10 times the number of parameters / the least likely outcome (in your case .08 churn) yields a proper number. However, I'm not sure what you mean by "optimum" sample size. This will always be dependent upon the number of variables in the model. If you end up throwing out variables for whatever reason, it will change.
-Ralph Winters
© 2020 TechTarget, Inc. Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles