A Data Science Central Community
After posting my article about 6000 companies hiring data scientists, I looked at the top ones in the list, and I compared with the number of job interviews (invites for screening interviews) that I receive almost daily.
There are some interesting facts. A few of the top companies never contact me (even though I have the perfect experience and I live a few miles away, though I've heard they like to relocate people rather than hire locals, as they believe that relocated people make for more committed employees). Others like GE, IBM, British Petroleum, Starbucks, Capital One, Deloitte do contact me, though if you look at my resume (accessible from the resume section below), you would think it's a bit surprising. This can also have to do with corporate culture, with some having aversion to hiring disruptive people.
Evidently there are some biases: I blog a lot and occasionally criticize some companies or publish proprietary material (from my own data science research lab), and some companies just don't like that. I also look more like a start-up guy, so I receive far more interest from start-ups. Still, a few (minority) of the biggest companies have never shown interest in me (despite my numerous LinkedIn connections with them). Maybe they never heard about me (I don't think so), or they believe I can't bring value to them, or believe that my experience is not a good fit, or believe that I am too expensive.
Solution to data scientists being too expensive
Regarding "too expensive", with income now above $400k/year (since I changed from "employee" to full-time "co-founder/owner") , that's true, I'm too expensive, but they don't know about that. Actually in my last "employee" positions back in 2012, I was making $150k-$170k, which is not high by Bay Area standards. I worked mostly in California (it has a 10% income tax on such salaries) although I was physically in Washington state (Washington has no income tax). So I could ask a salary $15,000 below what local candidates would ask. I mention this because for California or New Jersey or New York employers, this is one way to reduce your costs: hire telecommuting employees located in states with no income taxes (plenty of great talent can be found in Austin, Texas, as well as near Seattle, Washington - both places with no income tax). Real estate is also well below California or East Coast in these locations. A guy like me probably has a $5,000 monthly mortgage in the Bay Area, mine is $2,500, with a beautiful house, though I was lucky: I sold my house in the Bay Area and then moved to Seattle at the right time (2006). All in all, I could ask a salary $50,000 below what a Bay Area applicant would ask, and still have a better quality of life with no financial stress. This was true until 2012, now this window of opportunity (for employers) is closed, especially since not only my income was boosted by a factor 3, but my job security is also much better now (Note: I don't recommend you to switch from employee to founder; most fail; you should test the waters first, make sure you don't have financial stress, and that you love "doing business" more than you love doing statistics; and that you are good at managing finances, finding the right partners, outsourcing, delegating, business hacking, "lean start-ups", marketing, product, vendor selection, market understanding, and a bunch of other things, and most importantly passion-driven and focused on generating both value and profits).
But I'm sure there are plenty other people who did the same move as I did (probably located in Austin, Texas, or Seattle now). Identifying them might be worthwhile, as they need lower salaries. Or people who benefited from the spectacular stock market recovery might also be happy with lower salaries - a weapon that they can use to compete with other data scientists. Of course, if you need a top gun, it might be worth paying him $500k/year, but then you might need to provide him with the right job title and responsibilities (possibly external consultant or chief scientist, if you don't already have one), otherwise it could create jealousy and bad team spirit.
Now don't get me wrong, I'm not looking for a job and probably will never again, so not contacting me is a smart decision, but employers don't know that either. Also, I don't fit well as an employee, but again these companies don't know that until I've worked 2 years for them. Also some of the companies interested in discussing a job opportunity with me might just want to spend a day with me, at no cost, and steal as much IP (intellectual property, ideas) as they can from me, with no intention of hiring me in the first place.
Some companies - including one that I regularly criticize, publicly posting solutions to improve their revenue using better data science - keep contacting me regularly though I'm not a good fit (for instance, they are consistently looking for developers with some statistical knowledge, but I am not a developer). I'm sure that this company I'm talking about has boosted its revenue by many, many million dollars thanks to reading and applying my solutions, so at least they benefit from me (so do all their competitors who also read my postings).
Hiring process needs to be improved
When hiring a data scientist, sometimes the hiring company does not have data scientists on staff to interview you. They use statisticians (sometimes called director of analytics or market research analyst) to assess your data science technical knowledge. But a data scientist might not know all the modern flavors of logistic regression (for instance) and might be perceived as incompetent. But the data scientist know lots of things - API's, computational complexity, ROI optimization, KPI design and data collection, relevancy engines - that the business statistician might not know or does not value. The data scientist applicant ends up being labeled as incompetent and not hired. This brings an interesting question:
Who should interview data scientist applicants? Should it be developers, or statisticians, or business people? And how can a data scientist (interviewee) convince a statistician (interviewer) that his knowledge is critical? Answer: By discussing success stories, where your mix of business acumen / engineering / big data (non traditional) statistics helped a project succeed.
Sometimes statisticians or business analysts are afraid by data scientists, and will write a bad review after the interview, with recommendation not to hire, out of fear. It would make sense to find and hire an external (third party, neutral) consultant (data scientist him/herself) to interview the candidate.
Mismatch between resumes and job ads
The general problem for lack of analytic talent, and for employers as well as employees wasting tons of time in fruitless job interviews, is well illustrated when you compare resumes and job ads below. One of our readers (look at the comments below) mentioned that the skill R (one of the too most popular programming languages used by data scientists, the other one being Python) is never picked up by automated search tools used by recruiters to parse resumes, because it's just one letter. So it does not matter if you have R or not in your resume, if the hiring comparing uses poor automated filtering tools to narrow down on candidates with desired skills, such as (especially and ironically) R. Another reason why companies should not rely exclusively on these tools if they are looking for someone with R (the programming language) in their resume. They should also manually check profiles on LinkedIn and on our network (on AnalyticBridge.com and on DataScienceCentral.com) as some people choose not to be on LinkedIn. Or find a skill most frequently associated with R, (maybe Python, SAS, Perl, Matlab) and use it as a proxy to find R programmers. Interestingly, a search for R on LinkedIn returned the following profiles: International Financier, Account Executive R+L Carriers, Recovery at Citizen's Acting Together Can Help, R M at DHFL, Critical Care Transport R.N, R.R.T at LifeLine Ambulance Service, and so on. None have the R skill you are looking for, indeed none of them are analytic people, not even remotely. Eventually, search technology will address this issue - it's a very easy fix, for search technology engineers: use a white-list of keywords, do not remove 1-letter or 2-letter words found in your white list (including R for R programming, IT for Information Technology, and so on) when processing a user query.
The following sample resume extracts are from actual data science practitioners who agreed to be featured in my book. In order to allow these professionals to delete or update their resume, I have made the resumes accessible on the web, at http://bit.ly/1j4PNuP. You can find many more resumes and profiles by doing a search on LinkedIn with the keyword data science or related keywords, or by browsing Data Science Central member profiles.
Included here in the list are people from different locales and backgrounds in an attempt to cover various aspects of data science. The emphasis is on providing a well-balanced mix of professional analytic people — both junior and senior, people with big company or startup experience or both, top stars and people with average resumes (sometimes the most faithful employees), and corporate or consultant or academia-related people. These resumes have been shortened and reformatted. I also added mine, to provide an example with patents and classical big data science such as credit card fraud detection or digital analytics.
By comparing these resumes to the job ads below, it seems that Human Resources departments are sometimes looking for a unicorn — a professional with a skill mix that does not exist. Sometimes they hesitate between hiring a data engineer, a business analyst, or a data scientist. I encourage employers to seek out and hire people with strong potential and train them, rather than looking for the rare and expensive unicorn who often turns out to not be the best fit (and may only be happy running their own business). It's sometimes easier to hire a software engineer or business guy (MBA) and have him learn statistics, than the other way around, especially at the beginning of a big project. Long term, not hiring data scientists mean losing against competition, better equipped to extract value out of data.
You should check these resumes to see career progression (lateral or vertical), and the degrees, ongoing training and certifications these people have gained.
Consider the sample (yet actual and recent) job ads found at http://bit.ly/1hVAmr7. The skills most frequently listed are: Python, Linux, UNIX, MySQL, Map-Reduce, Hadoop, Matlab, SAS, Java, R, SPSS, Hive, Pig, Scala, Ruby, Cassandra, SQL Server, and NoSQL. Many times, several of these skills are listed (5-6), while applicants only have a few (2-3) in their resume.