(See these articles on Forbes.com for definitions of a data scientist from leading experts in the field:
- Tableau Software's Pat Hanrahan on "What is a Data Scientist?"
- LinkedIn's Monica Rogati on "What is a Data Scientist?"
- LinkedIn's Daniel Tunkelang on "What is a Data Scientist?"
- EMC Greenplum's Steven Hillion on What Is a Data Scientist?
- Amazon's John Rauser on "What Is a Data Scientist?")
This problem statement addresses the challenge of “growing your own” data scientist. We strongly believe that the person who understands how to make best use of data is very important. We also believe these people will have to be “created,” rather than hired. Often, the solution will not be to create a just one person who can be the data scientist, but rather to open up communication so that a team can do the job instead of having to have a virtuoso.
Context and Background
Many sources of data are coming becoming available in almost every dimension of life and business. Vendors are stepping up to the plate by providing tools to understand big data and any other kind of data. Companies like Splunk and 1010data offer Agile Big Data technology that is simple enough for normal humans to use but powerful enough to handle massive volumes of data. Revolution Analytics aims to make advanced statistics easy to use by enhance the R suite of statistical software. Visualization technologies like QlikView, Tableau, and TIBCO Spotfire, are bringing new analytical power to the edges of the organization.
These technologies are growing in power and becoming very sophisticated, and are leading us down the path to a world of “user-driven innovation,” where the analysis of complex data is no longer a months-long project for IT, but a quick set of clicks by an inquisitive business user, who can then immediately take action. In this world of user-driven innovation, how can we bring in the skills to analyze this data into the business world, so people can analyze the data themselves?
When the knowledge of the business domain and the knowledge of how to analyze data using advanced techniques are present in one mind, a data scientist is born.
There are three ways to grow a data scientist in in most business environments:
- 1. Provide the business staff with tools so they can analyze data and answer questions on their own.
- 2. Communicate the questions that need to be answered to the analytics and IT experts who can then use the advanced technology to answer them.
- 3. Improve communication so that business staff along with the analytics and IT experts can work as a team.
All three approaches are needed. Some technology is empowering and can allow strategy 1 to work. But many valuable ways of analyzing data is too hard for even super users to use, requiring strategy 2. Ideally, both strategies are in place at the same time, which usually leads to the team mentioned in strategy 3.
As the payoff from data becomes more and more clear, the task of growing a data scientist will become urgent. It is unlikely that enough trained data scientists will come out of universities.
Our goal is to develop a reference model to create a data scientist population, and a maturity model to evaluate an organization’s effectiveness at creating and maintaining data scientists.
What is a data scientist?
What are the tools that a data scientist needs to use?
How can we create a process for evaluating new sources of data?
How can we create a process for harvesting questions?
How would a data scientist work with emerging software to support user-driven innovation?