A Data Science Central Community
Here we provide a simple formula easily derived from LinkedIn stats (stats that any of us can collect in a few minutes).
The purpose is to identify, among our LinkedIn connections, the ones with a large network. It's useful if you hire a sales guy (to gauge the size of its network), or if you are looking for a job and talking to a recruiter: if the recruiter has many connections, he might be well equipped to help you. In short, it provides some competitive intelligence.
When you look at your LinkedIn connections, it's very easy to find how many connections each of your connection has. Except that if it is above 500, it just says 500+. You don't know if it means 502 or 6,000. However you can look at shared connections (LinkedIn provides these numbers for each of your connection), and this is our basis to compute our estimates.
By the way, Computing these estimates could be the subject of an interesting job interview question.
Tom Davenport and I share 118 connections, thus Tom must have about 11,800 LinkedIn connections
Let's introduce some notations:
P(C is a shared connection) = P(C is connected to you) * P(C is connected to B) = (y/N) * (x/N) = (x*y) / (N*N)
Thus x = (z*N) / y, or N = (x*y) / z.
Step # 1: Compute N
In the above table, I sampled a few of my connections that have less than 500 connections, to find out what x and z were. My number of connections is y=9,670.
A first approximation (visual analytics without using any tool other than my brain!) yields
N = (x*y) / z = (approx.) 500 * 9,670 / 5 = (approx.) 1 million.
So my N is 1 million. Yours might be different. Note that the number of people on LinkedIn is well above 100 millions (I'm one the 100 featured in the picture when they made the announcement in 2011).
Step #2: compute x, for a specific connection
Using the formula x = (z*N) / y, if a LinkedIn member shares 200 connections with me, he probably has around 20,000 connections, using y=10,000 rather than 9,670, as an approximation for my number of connections. If he shares only one connection with me, he's expected to have 100 connections.
You can compute confidence intervals for x by first computing confidence intervals for N, by looking at the variations in the above table. You can also increase accuracy by using a variable N that defends on job title or location.
Question: would you be interested in purchasing a list with estimated number of connections, for each of your connections?