A Data Science Central Community
An updated version with source code and detailed explanations can be found here.
If observations from a specific experiment (for instance, scores computed on 10 million credit card transactions) are assigned a random bin ID (labeled 1, ··· ,k), then you can easily build a confidence interval for any proportion or score computed on these k random bins, using the Analyticridge theorem.
The proof of this theorem relies on complicated combinatorial arguments and the use of the Beta function. Note that the ﬁnal result does not depend on the distribution associated with your data - in short, your data does not have to follow a Gaussian (a.k.a normal) or any prespecified statistical distribution, to make the confidence intervals valid. You can ﬁnd more details regarding the proof of the theorem in the book Statistics of Extremes by E.J. Gumbel, pages 58-59 (Dover edition, 2004)
Parameters in the Analyticbridge theorem can be chosen to achieve the desired level of precision - e.g a 95%, 99% or 99.5% confidence interval. The theorem will also tell you what your sample size should be to achieve a pre-specified accuracy level. This theorem is a fundamental result to compute simple, per-segment, data-driven, model-free conﬁdence intervals in many contexts, in particular when generating predictive scores produced via logistic / ridge regression or decision trees / hidden decision trees (e.g. for fraud detection, consumer or credit scoring).
A scoring system designed to detect customers likely to fail on a loan, is based on a rule set. On average, for an individual customer, the probability to fail is 5%. In a data set with 1 million observations (customers) and several metrics such as credit score, amount of debt, salary, etc. if we randomly select 99 bins each containing 1,000 customers, the 98% conﬁdence interval (per bin of 1,000 customers) for the failure rate is (say) [4.41%, 5.53%], based on the Analyticridge theorem, with k = 99 and m = 1 (read the theorem to understand what k and m mean - it's actually very easy to understand the signification of these parameters).
Now, looking at a non-random bin with 1,000 observations, consisting of customers with credit score < 650 and less than 26 years old, we see that the failure rate is 6.73%. We can thus conclude that the rule credit score < 650 and less than 26 years older is actually a good rule to detect failure rate, because 6.73% is well above the upper bound of the [4.41%, 5.53%] confidence interval.
Indeed, we could test hundreds of rules, and easily identify rules with high predictive power, by systematically and automatically looking at how far the observed failure rate (for a given rule) is from a standard conﬁdence interval. This allows us to rule out eﬀect of noise, and process and rank numerous rules (based on their predictive power - that is, how much their failure rate is above the confidence interval upper bound) at once.