Subscribe to DSC Newsletter

This is a simple technique. Let's say you want to estimate a parameter p (proportion, mean, it does not matter).

* Divide your observations into N random buckets
* Compute the estimated value for each bucket
* Rank these estimates, from p_1 (smallest value) to p_N (largest value)
* Let p_k be your confidence interval lower bound for p, with k less than N/2
* Let p_(N-k+1) be your confidence interval upper bound for p


* [p_k,p_(N-k+1)] is a non parametric confidence interval for p
* The confidence level is 2 k/(N+1)

Has anyone tried to use this formula? Or performed simulation (e.g. with a normal distribution) to double check that it is correct? Note that by trying multiple values for k, (and for N although this is more computer intensive), it is possible to interpolate confidence intervals of any levels.

Finally, you want to keep N as low as possible (and k=1 ideally) to achieve the desired confidence interval level. For instance, for a 90% confidence interval, N=19 and k=1 work and are optimum.

Views: 1684

Reply to This

Replies to This Discussion

Do you happen to have any numeric examples of this technique? Thanks!
If the size of your random buckets is the same as your sample size, then these would be what are called "bootstrap Confidence intervals".


On Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service