A Data Science Central Community

What are the potential distributions for a continuous variable *X* on [0, 1], if |2*X* - 1| is known to have a uniform distribution on [0, 1]? Will the distribution of INT(2*X*) always be uniform on {0, 1} ?

This question arises in a potential proof that the digits of the number Pi in base 2 (see exercise 7 in this article), distributed as INT(2*X*) and obviously being equal to 0 or 1, are uniformly distributed (50% of 0's and 50% of 1's.)

**Update**

I spent more time on this problem, and it is not an easy one. There are actually infinitely many solutions, as many as there are real numbers on [0, 1]. The vast majority of these distributions are nowhere continuous -- they don't have a density. To understand this, do the following simulation:

- Simulate
*n*random deviates*u*(*n*) uniformly distributed on [0, 1]. - Generate
*n*numbers*d*(*n*) distributed on {-1, +1}. They don't need to be uniformly distributed: they can all be -1 or +1 or any combination of both. For instance*d*(*n*) can be -1 if the*n*-th digit of Pi in base 2, is zero, and +1 if the*n*-th digit of Pi in base 2, is one. You can use any other number instead of Pi, for instance 7/13, and then the final result will be different. - For each
*n*, compute*v*(*n*) =*d*(*n*) **u*(*n*). - For each
*n*, compute*x*(*n*) = (1 +*v*(*n*)) / 2.

The limiting random variable *X* attached to the *x*(*n*)'s, as *n* tends to infinity, is solution to the problem. However, there are as many solutions as there are ways to generate the *d*(*n*)'s, and the distribution of INT(2*X*) will be discrete on {0, 1}, but usually not uniform: it will depend on the proportions of +1 and -1 in the *d*(*n*)'s. If you use the number Pi to compute the *d*(*n*)'s, it will be uniform.

Tags:

© 2020 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions