A Data Science Central Community
Facebook has 800MM users. Out of these 800MM "users", how many are duplicate (or triplicate), fake, dummy, inactive, decoy, stolen IDs, non-users (e.g. a book) and other artificial accounts?
How do you go about estimating this proportion? My guess is that less than 50% are unique, real, "non-dead" users (by "non-dead", I mean users with at least one activity over the last 6 months - such as logon, posting a message, inviting a friend, updating profile).
It does not matter how many accounts are bogus. The only thing that matters is impressions to clicks ratio - far below Google: it is very low on FB, but I'm not sure if it's due to the large number or artificial accounts. It has to do with sub-optimal ad targeting. Read Online advertising: a solution to optimize ad relevancy to find out how to optimize ad targeting.
Note: If you remove these artificial users, the value of a FB member increases from $4/year to $8/year
I did a quick test, creating five random names from scratch (vincent75, robert64, amy15, amy6, didierf) and checked their most recent activity on FB. Based on time since last action, 75% of the 4 existing profiles are active. Here are the results:
Very interesting Mirko! This could be a great project for a data science candidate (someone who wants to become a data scientist). Create 100,000 bogus names (with the help of an online dictionary and by adding combinations of digits at the end), see how many exists as FB profiles, and how many are active. Use a web crawler to complete the task, it should not take more than a day of work, including for the crawling activity (if it's organized using a rudimentary distributed architecture).
@Amy - would it not be high impression-to-click ratio that is desirable for "free branding"? Low ratio would be indicative of good matching.
I think comparing CPC on Google and FB is incorrect. If anything, the CPC on Google+ (strict) and FB could be compared. Users do not log into FB to search for info on / compare features and prices of / ultimatily shop for laptops (or tools, or clothing etc.) That's what Google trained most of us to do. It's matching the intent (demand) with the offer (supply). What is the demand from the users on FB? I would submit that is more social and less search/purchasing oriented. As such, FB is for now best suited for branding efforts. My 2 cents...
I like Mirko's idea of testing using the random simulated names.
This make me think about how can we generate a sample (simulated account names) to represent the major Facebook account population groups (age of the account, age of the owner, location, career, etc)? Do people in different group have their preference in account name? I suspect that individuals belong to different groups behave differently.
Would one activity include just logging into Facebook? I had a relative who just logged in to see pictures. She did not like or post anything. I would assume she is not the only person who behaves this way.
Here's another way Facebook generate revenue: when you post a Wall Street article on your FB timeline, any click that is generated results in a commission paid by the Wall Street Journal, to Facebook.
I checked a link to a WSJ article that I posted on my Facebook account, and magically, the following tags were added in the query string: fb_rev=wsj_share_FB and fb_source=timeline. The full link, on my FB page, is:
This brings an interesting issue: link fraud, by posting the same URL on various places, but substituting the tags by fake ones to claim the revenue: you need to be an approved WSJ publisher or sub-publisher or sub-sub-publisher to get the fraudulent credits, but you get the the idea about how this fraud scheme would work.
@Vincent: I do not think that posting the URL outside FB would work. For once, browsers (and HTTP standard, as far as I know) "report" the "referral page": http://en.wikipedia.org/wiki/HTTP_referer
Which doesn't mean one could not fool the browsers themselves to think they are on a given page when they're not, and report that as the referral page.
But the benefits would be limited or short lived (and likely, given the monthly or longer reimbursement cycle, uninteresting financially): the tags would get "credited", and a spike in activity, or large sums to be paid, always attract attention.
Or so one would hope...
This type of fraud could be motivated by different reasons, not necessarily with direct financial incentives. For instance, one might generate fake traffic (fake monitoring tags and/or fake referral)
I would estimate the number of unique, non-dead users to be smaller - perhaps 20-35% of all FB accounts.