A Data Science Central Community
Most random number generators use an algorithm a(k+1) = f(a(k)) to produce a sequence of integers a(1), a(2), etc. that behaves like random numbers. The function f is integer-valued and bounded; because of these two conditions, the sequence a(k) eventually becomes periodic for k large enough. This is an undesirable property, and many public random number generators (those built in Excel, Python, and other languages) are poor and not suitable for cryptographic applications, Markov Chains Monte-Carlo associated with hierarchical Bayesian models, or large-scale Monte-Carlo simulations to detect extreme events (example: fraud detection, big data context).
It's easy to build a sequence of non-periodic numbers or digits, by using irrational numbers. A basic example is the following number:
By construction, this decimal number has no period. Unfortunately, if you use the digits (decimals) of this number as a sequence of random digits, it has obvious undesirable properties, even worse than most cases of periodicity. Is it possible to reshuffle these digits to make this sequence look much more random?
Better sequences are associated with natural transcendental numbers:
The latter number has been artificially designed to produce fast convergence (one new digit for each term in the series) by integrating the series g(x) = 1 + x + x^2 + x^3 + ... = 1/(1 - x) and using x = 1/10. Click here for other similar examples.
To test whether a sequence is random enough, you can use a battery of statistical tests on the digits and blocks of 2, 3 or more digits:
As you carry a large number of tests, you will face many false positives (deviation from randomness according to some tests, even though you have almost perfect random numbers). How do you address this issue?
Also, you don't need to know the statistical theory of hypothesis testing to work on this problem. For instance, to test if a(k +1) - a(k) behaves like a difference of two uniform distributions, either compute that theoretical distribution or plot it doing Monte-Carlo simulations, then look at the distribution of a(k+1) - a(k) on hundreds of data buckets (your simulated random numbers), then compute confidence intervals based on these buckets to see how close your observed distribution is to the theoretical distribution. If both distributions are really different, then your random number generator has a serious issue.
This challenge can be broken down in a few sub-problems:
Finally, if we define a decimal number d as a concatenation of positive integers a(1), a(2) and so on, preceded by "0.", where a(k) is a sequence with no upper bound, then d is non-periodic and irrational (Under which conditions is this true? For instance this is not true if a(1) = 3 and a(k+1) = 10*a(k) + 3, resulting in d = 1/3). As an illustration, if a(k) = k^2, then d = 0.149162536496481100121144... Can you find some a(k) that, in addition to non-periodicity, provides good randomness properties?