A Data Science Central Community
Here we provide a simple formula easily derived from LinkedIn stats (stats that any of us can collect in a few minutes).
The purpose is to identify, among our LinkedIn connections, the ones with a large network. It's useful if you hire a sales guy (to gauge the size of its network), or if you are looking for a job and talking to a recruiter: if the recruiter has many connections, he might be well equipped to help you. In short, it provides some competitive intelligence.
When you look at your LinkedIn connections, it's very easy to find how many connections each of your connection has. Except that if it is above 500, it just says 500+. You don't know if it means 502 or 6,000. However you can look at shared connections (LinkedIn provides these numbers for each of your connection), and this is our basis to compute our estimates.
By the way, Computing these estimates could be the subject of an interesting job interview question.
Tom Davenport and I share 118 connections, thus Tom must have about 11,800 LinkedIn connections
Let's introduce some notations:
Basic formula
P(C is a shared connection) = P(C is connected to you) * P(C is connected to B) = (y/N) * (x/N) = (x*y) / (N*N)
Thus x = (z*N) / y, or N = (x*y) / z.
Step # 1: Compute N
In the above table, I sampled a few of my connections that have less than 500 connections, to find out what x and z were. My number of connections is y=9,670.
A first approximation (visual analytics without using any tool other than my brain!) yields
N = (x*y) / z = (approx.) 500 * 9,670 / 5 = (approx.) 1 million.
So my N is 1 million. Yours might be different. Note that the number of people on LinkedIn is well above 100 millions (I'm one the 100 featured in the picture when they made the announcement in 2011).
Step #2: compute x, for a specific connection
Using the formula x = (z*N) / y, if a LinkedIn member shares 200 connections with me, he probably has around 20,000 connections, using y=10,000 rather than 9,670, as an approximation for my number of connections. If he shares only one connection with me, he's expected to have 100 connections.
You can compute confidence intervals for x by first computing confidence intervals for N, by looking at the variations in the above table. You can also increase accuracy by using a variable N that defends on job title or location.
Question: would you be interested in purchasing a list with estimated number of connections, for each of your connections?
Related questions
Comment
@Hussain: Probably. Also note that the confidence interval in my estimate for Tom Davenport's connections count is rather large: from 3,000 to 20,000. Adding more data points should decrease volatility in N, and thus narrow the confidence interval.
Good analysis Vincent. Curious, can including the linkedin stats on 2nd connections and using the "XX Connections link you to XXXXXXXXX+ professionals" for N provide better results
Thanks
Hussain
© 2019 AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of AnalyticBridge to add comments!
Join AnalyticBridge