A Data Science Central Community

This challenge of the week consists of doing simulations to replicate the special clustering process described in our Zipf law article, resulting in the widespread Zipf distributions applicable to many natural and economical phenomenons. More specifically, we ask you to perform the following Monte-Carlo simulations (special clustering algorithm) to assess whether our assumptions, in our above article, are correct. Alternatively, a mathematical proof is OK.

*Figure 1: Zipf’s law and the distribution of patents among applicants*

Let's assume that we have *k* = one million atoms, for instance space dust particles. Test the following algorithm:

**Algorithm** (write it in Perl, R, Python, C or Java)

*Step #1*

Each particle is assigned a unique bin ID between 1 and 1,000,000. Each particle represents a cluster with one element (the particle in question), and the bin ID is its cluster ID.

*Step #2*

Iteration: repeat 200,000,000 times:

- Randomly select two integers
*i*and*j*between 1 and current number of clusters. - If size of cluster
*i*and cluster*j*are similar, merge these clusters with probability p > 0.8, Otherwise, merge these clusters with probability q < 0.3 - Update cluster list

Once the algorithm stops, the final cluster configuration represents the current solar system, or companies in US as described in our original article. Does it really satisfy a Zipf distribution?

Tags: