Subscribe to DSC Newsletter

This is a simple technique. Let's say you want to estimate a parameter p (proportion, mean, it does not matter).

* Divide your observations into N random buckets
* Compute the estimated value for each bucket
* Rank these estimates, from p_1 (smallest value) to p_N (largest value)
* Let p_k be your confidence interval lower bound for p, with k less than N/2
* Let p_(N-k+1) be your confidence interval upper bound for p


* [p_k,p_(N-k+1)] is a non parametric confidence interval for p
* The confidence level is 2 k/(N+1)

Has anyone tried to use this formula? Or performed simulation (e.g. with a normal distribution) to double check that it is correct? Note that by trying multiple values for k, (and for N although this is more computer intensive), it is possible to interpolate confidence intervals of any levels.

Finally, you want to keep N as low as possible (and k=1 ideally) to achieve the desired confidence interval level. For instance, for a 90% confidence interval, N=19 and k=1 work and are optimum.

Views: 1492

Reply to This

Replies to This Discussion

Do you happen to have any numeric examples of this technique? Thanks!
If the size of your random buckets is the same as your sample size, then these would be what are called "bootstrap Confidence intervals".


Follow Us

On Data Science Central

On DataViz

On Hadoop

© 2018 is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service