sample size - AnalyticBridge2019-11-18T14:59:38Zhttps://www.analyticbridge.datasciencecentral.com/forum/topics/sample-size?feed=yes&xn_auth=noThe following formula allows…tag:www.analyticbridge.datasciencecentral.com,2013-04-11:2004291:Comment:2411912013-04-11T11:16:54.033ZKOUADIO Michelhttps://www.analyticbridge.datasciencecentral.com/profile/KOUADIOMichel
<p>The following formula allows to calculate the size of a sample by taking into account the proportion of the target population:</p>
<table border="1" cellspacing="0">
<tbody><tr><td rowspan="2" width="64"><p align="center"><b>N >=</b></p>
</td>
<td valign="top" width="217"><p align="center"><b>N*p*(1-p)</b></p>
</td>
</tr>
<tr><td valign="top" width="217"><p align="center"><b>P*(1-p) + l²*(N-1)/z²</b></p>
</td>
</tr>
</tbody>
</table>
<p></p>
<p>With:</p>
<p>N = Size of the…</p>
<p>The following formula allows to calculate the size of a sample by taking into account the proportion of the target population:</p>
<table border="1" cellspacing="0">
<tbody><tr><td rowspan="2" width="64"><p align="center"><b>N >=</b></p>
</td>
<td valign="top" width="217"><p align="center"><b>N*p*(1-p)</b></p>
</td>
</tr>
<tr><td valign="top" width="217"><p align="center"><b>P*(1-p) + l²*(N-1)/z²</b></p>
</td>
</tr>
</tbody>
</table>
<p></p>
<p>With:</p>
<p>N = Size of the population</p>
<p>n = Size of the sample</p>
<p>p = Proportion to be estimated</p>
<p>l = Chosen margin of error</p>
<p>z = Level of confidence</p>
<p></p>
<p>So for</p>
<p></p>
<p>N = 2000</p>
<p>p = 0,91</p>
<p>l = 0,1</p>
<p>z = 1,95</p>
<p></p>
<p>we have <b><i>n = 30,99</i></b></p>
<p></p>
<p><b><i>Thus by adjusting well the parameters we can reach the size of 30</i></b></p>
<p></p>
<p></p>
<p></p> I am also trying to get a "ru…tag:www.analyticbridge.datasciencecentral.com,2013-04-04:2004291:Comment:2405742013-04-04T19:42:30.683ZHolly V. Campbellhttps://www.analyticbridge.datasciencecentral.com/profile/HollyVCampbell
<p>I am also trying to get a "rule of thumb" for determining optimal sample size and density within sample. I have a feeling there isn't a hard set rule for this but I am interested in hearing additional opinions.</p>
<p></p>
<p>Ideally from my point of view it seems you should see somewhere around a minimum density of 10% for your target (predicted) value from your entire sample. Ultimately I would guess a 30-50% density is an ideal situation. Post modeling, it seems you should balance…</p>
<p>I am also trying to get a "rule of thumb" for determining optimal sample size and density within sample. I have a feeling there isn't a hard set rule for this but I am interested in hearing additional opinions.</p>
<p></p>
<p>Ideally from my point of view it seems you should see somewhere around a minimum density of 10% for your target (predicted) value from your entire sample. Ultimately I would guess a 30-50% density is an ideal situation. Post modeling, it seems you should balance your lift results along with the density values to understand the relationship between the two. </p>
<p></p>
<p>Can you please share your best successes with predictive modeling, how big a sample set along with what density or penetration of positive targets. I have so far experiences relatively low successes with modeling marketing data - three different scenarios</p>
<p>#1, N=~9k, 200 positive events</p>
<p>#2, N=~500, 350 positive events</p>
<p>#3, N= ~700, 525 positive events</p>
<p></p>
<p>In my opinion all of these seem like poor samples to run regression models or decision trees on and don't lend to statistically significant representation to perform predictive modeling. I welcome any and all feedback, as these initial sets were pre-determined and I want to avoid designing future analytics cases with such poor conditions.</p>
<p></p>
<p>What are the minimum requirements you will consider for sample size and positive # events???</p> If you use multivariate stati…tag:www.analyticbridge.datasciencecentral.com,2013-03-20:2004291:Comment:2386372013-03-20T06:49:59.358ZB K Bhaumikhttps://www.analyticbridge.datasciencecentral.com/profile/BKBhaumik
<p>If you use multivariate statistics for data analysis, say dealing with 15 variables then sample size should be greater than 3 times the number of variables i.e., more than 45. For single variable empirical data analysis, 30 samples is found to be okey. </p>
<p>If you use multivariate statistics for data analysis, say dealing with 15 variables then sample size should be greater than 3 times the number of variables i.e., more than 45. For single variable empirical data analysis, 30 samples is found to be okey. </p> Hi,
There are at least two c…tag:www.analyticbridge.datasciencecentral.com,2013-03-19:2004291:Comment:2385082013-03-19T20:47:11.965ZMike Rowehttps://www.analyticbridge.datasciencecentral.com/profile/MikeRowe
Hi,<br />
<br />
There are at least two considerations:<br />
1. The sample size is very dependent on the experimental design, in that the more independent variables one has the more your degrees of freedom get partitioned. For an example of this you might want to look into Analysis of Variance designs and F-Tables.<br />
2. The size of the "true" differences you are trying to measure and the quality of your measurements, in that, the smaller the differences and the poorer your measurement quality the larger your…
Hi,<br />
<br />
There are at least two considerations:<br />
1. The sample size is very dependent on the experimental design, in that the more independent variables one has the more your degrees of freedom get partitioned. For an example of this you might want to look into Analysis of Variance designs and F-Tables.<br />
2. The size of the "true" differences you are trying to measure and the quality of your measurements, in that, the smaller the differences and the poorer your measurement quality the larger your sample size will need to be in order to achieve statistically significant differences. In some experiments (say particle physics, sample sizes are HUGE, >10^10 ) .