A Data Science Central Community
i am a research scholar from Pondicherry University, i am having a doubt regarding the sample size. I have chosen a sample size of 30 for my paper but my friends are telling that always the sample size should be more than 50.But in one of the workshops i have attended the resource person told that we can do with 30 samples. So i am having a very big confusion in my mind. Could you able to clear my doubts.
If you use multivariate statistics for data analysis, say dealing with 15 variables then sample size should be greater than 3 times the number of variables i.e., more than 45. For single variable empirical data analysis, 30 samples is found to be okey.
I am also trying to get a "rule of thumb" for determining optimal sample size and density within sample. I have a feeling there isn't a hard set rule for this but I am interested in hearing additional opinions.
Ideally from my point of view it seems you should see somewhere around a minimum density of 10% for your target (predicted) value from your entire sample. Ultimately I would guess a 30-50% density is an ideal situation. Post modeling, it seems you should balance your lift results along with the density values to understand the relationship between the two.
Can you please share your best successes with predictive modeling, how big a sample set along with what density or penetration of positive targets. I have so far experiences relatively low successes with modeling marketing data - three different scenarios
#1, N=~9k, 200 positive events
#2, N=~500, 350 positive events
#3, N= ~700, 525 positive events
In my opinion all of these seem like poor samples to run regression models or decision trees on and don't lend to statistically significant representation to perform predictive modeling. I welcome any and all feedback, as these initial sets were pre-determined and I want to avoid designing future analytics cases with such poor conditions.
What are the minimum requirements you will consider for sample size and positive # events???
The following formula allows to calculate the size of a sample by taking into account the proportion of the target population:
P*(1-p) + l²*(N-1)/z²
N = Size of the population
n = Size of the sample
p = Proportion to be estimated
l = Chosen margin of error
z = Level of confidence
N = 2000
p = 0,91
l = 0,1
z = 1,95
we have n = 30,99
Thus by adjusting well the parameters we can reach the size of 30