A Data Science Central Community
We are interested here in factoring numbers that are a product of two very large primes. Such numbers are used by encryption algorithms such as RSA, and the prime factors represent the keys (public and private) of the encryption code. Here you will also learn how data science techniques are applied to big data, including visualization, to derive insights. This article is good reading for the data scientist in training, who might not necessarily have easy access to interesting data: here the dataset is the set of all real numbers -- not just the integers -- and it is readily available to anyone. Much of the analysis performed here is statistical in nature, and thus, of particular interest to data scientists.
Factoring numbers that are a product of two large primes allows you to test the strength (or weakness) of these encryption keys. It is believed that if the prime numbers in question are a few hundred binary digits long, factoring is nearly impossible: it would require years of computing power on distributed systems, to factor just one of these numbers.
While the vast majority of big numbers have some small factors and are thus easier to break, the integers that we are dealing with here are difficult to handle, by design. Factoring a product of two large primes is considered to be an intractable computer science problem. Here, we use techniques of big data, statistics, and machine learning - in short a data science approach - to hopefully discover new efficient factoring techniques for these massive numbers. Our innovative approach is quite different from that used in traditional algorithms.
We have investigated this problem in the past already (click here for references) but here we propose a different, more general framework, offering multiple opportunities to jump-start new research paths on this topic.
Click here to read the full article. This long, rather technical article, featuring a brand new framework for factoring massive numbers, has the following content:
1. General Framework
2. Case Study
3. Data, Code, Visualizations
4. Improving the Algorithm
5. Source Code