Subscribe to DSC Newsletter

This elementary number has two interesting features. One that interests us data scientists: the generation of great (non-periodic) random bits, using its decimals. And one even more amazing to all mathematicians, and one of the greatest unresolved mathematical problems of all times: are the digits of SQRT(2) randomly distributed, say in base 2?

Figure 1: successive iterations of algorithm below; d(n) are digits of SQRT(2)/2 in base 2

The fact that such a simple question about such a basic number has yet to be proved or disproved as of today, is just simply amazing. Everyone very strongly believe that the answer is yes, and that indeed, subsequent digits are independent from each other. These decimals - and I mean billions of them computed using some Map-Reduce architecture - have successfully passed all randomness tests so far.

Yet the result below would let you think that these decimals (bits to be precise - here we work in base 2, not in base 10), because they obey such a natural, simple, non-chaotic recurrence relationship, just can't be random. The fact is that they almost certainly look extremely random (see column d(n) in Figure 1 above), and yet the result below is also and paradoxally correct - though please check it out yourself: this is the purpose of this challenge of the week.

Recursive algorithm to compute digits of SQRT(2)/2

Let us define the following recurrence system:

p(0) = 0, p(1)= 1, e(1) = 2

If 4p(n) + 1 < 2e(n) Then

  • p(n+1) = 2p(n) + 1
  • e(n+1) = 4e(n) - 8p(n) - 2
  • d(n+1) = 1

Else

  • p(n+1) = 2p(n)
  • e(n+1) = 4e(n)
  • d(n+1) = 0

Note that d(n+1) = p(n+1) - 2p(n).

The surprising result is that the d(n)'s represent the bits of SQRT(2)/2 when represented in base 2. In other words,

SQRT(2)/2 = SUM{ d(k)/2^k }

where the sum is over all integer k greater than 0. Can you prove it? Or find something wrong in my assertion? 

Unfortunately, because the p(n)'s and e(n)'s grow exponentially fast, this recursion is of no use to produce random bits. It does the job without violations - unlike pretty much all random number generators available in the public domain - but it can't be implemented. Could you find similar recursions that produce (simulated) randomness, and that are computationally manageable?

Attached is a small spreadsheet that shows the calculation, if you want to replicate my results in figure 1. Some hints to optimize this recursion, prove this result, as well as a generalization to SQRT(a)/a in base b, and even to generate the digits of a^{-1/c} in base b, using the same technique, are found in the next sections. In short, it means that many irrational numbers have a highly predictible sequence of digits, despite looking perfectly random. This might be a concern if you design cryptographic applications relying on certain types of high quality simulated random numbers.

Another surprising result, if you look at the column Approx to SQRT(2)/2 in Figure 1 (or better, download my spreadsheet), is the fact that the algorithm converges relatively fast to SQRT(2)/2, but in a very weird way: any time d(n)=0, there's no progress, as if the algorithm had reached the limiting value, or got stalled. Such behavior would cause most numerical analysis software programs to stop, erroneously believing that they reached the solution, once stuck in the first such configuration: in this example, when n=2 (n is the iteration counter). Interestingly, this algorithm will get stuck infinitely many times, for a billion of iterations in a row. This happens any time we hit a point in SQRT(2)/2 expansion where one billion bits are just zero. It happens incredibly rarely (that's the general belief), yet it happens an infinite number of times (that's the general belief too). Yet overal convergence is still very fast. 

Numerical Optimization

One way to get rid of the exponential growth (and thus exponential storage and computing time as the number of iterations - that is, n - is growing) is to store the log of p(n) and e(n), instead of p(n) and e(n). This would eliminate some of the computational/numerical issues associated with exponential growth. Note that p(n) is very well approximated by (SQRT(2)/2) * 2^n. Also note that d(n+1) is entirely determined by the sign of 4p(n) + 1 - 2e(n), that is, asymptotically, by the sign of 2p(n) - e(n).

Maybe transforming p(n) and e(n) is another solution, then obtaining recurrence relations for the transformed variables. For instance, instead of p(n), use q(n) = p(n) - INT(0.707106781 * 2^n). The number q(n) will be much more manageable (smaller) than p(n).

However, no matter how efficiently we transform the two variables p(n) and e(n), it is impossible to keep storage and computing time bounded as the number of iterations (that is, the number n) is growing. At best, we can expect linear growth for storage, or maybe sub-linear (like a log function). Otherwise, our recurrence system (if bounded over positive integers as n grows) would be periodic, defeating the purpose of producing a non-periodic random number simulator. In addition, it could not converge to an irrational number, if bounded.

Generalization and proof sketch

A generalization of my irrational number decimal generator is as follows, and will help you prove the validity of my result:

Here we try to produce the decimals, in base b, of a at power 1/c (that is, the decimals of a^{1/c} in base b), where abc are positive integers. It works best when both b=2 and c=2.

Define 

  • k(n+1) = b*k(n)
  • p(n) is the largest integer such that { k(n) }^c > a*{ p(n) }^c
  • p(n+1) = b*p(n) + r(n), with r(n) a non-negative integer strictly inferior to b (but as large as possible), such that a * { p(n+1) }^c < { k(n+1) } ^c
  • e(n), the positive error term, is defined by { k(n) }^c = a * { p(n) }^c + e(n)
  • d(n+1) = p(n+1) - b*p(n)

Check out our previous weekly challenges

Views: 7982

Reply to This

Replies to This Discussion

The numbers in column 4p(n)+1-2e(n) in Figure 1 don't have many divisors; prime numbers seem over-represented in that column. Quite a few are divisible by 7. Interesting.

Many sequences thought to be random, are not. For instance, the fractional part of log(n), for n = 2, 3, 4 et cetera, is not uniformly distributed on [0, 1], according to a theorem by Kuipers and Niederreiter.

RSS

Follow Us

On Data Science Central

On DataViz

On Hadoop

© 2017   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Terms of Service