A Data Science Central Community
Do the maverick mavens need managers?
The McKinsey and company report “Big data: The next frontier for innovation, competition and productivity” (May 2011) is a well publicized and circulated one on the internet .
The report projected that the demand for deep analytical positions in a big data world in the United States could exceed the supply based on the trends seen ( in 2011) , by 140,000 to 190,000 positions. While this was widely referred to , in various reports, what intrigued me was the additional information, tucked in there , which has been often ignored . The report further suggested that there would be a projection of a need for 1.5 million additional managers and analysts in the United States who can ask the right questions and consume the results of analysis of big data effectively. All these add up to a number which will make anyone to sit up and take notice.
This brings up the moot question- does an ensemble of Data Scientists need managers and that too in a proportion as being mentioned. Who could be those managers, what would be the skill and expertise and where would one get them from ?
Let us take a minute to discuss how a Data Science project is different from a typical application development project.
Data Science projects are different !
A Data Science project could be more akin to a typical R&D project which would be more of a process of discovery. The discovery as an output / outcome is expected to be more abstract and less definite .
For example, I was engaged in a Data Science project related to the area of weather forecasting .This is an area where predictive analytics is deployed , the deployment of the appropriate algorithms is the key to yield the results with the right accuracy .The skill would lie on the identification , selection and the deployment of the right approach, to deliver the highest possible accuracy of prediction . The right approach is a combination of the appropriate method of enriching the input data, competing a set of most applicable algorithm(s), tuning the parameters of the algorithm(s), ensuring that no over fitting is done and using the right metric to test the performance of the algorithm(s).
The team structure in a Data Science team is necessarily a flat structure .There is a reason. The roles are distinct and the level of peer collaboration is intense. Designated Data Scientists would take up defined role as per the “triangle of intelligence” namely aggregated content ( the raw data ) , algorithm ( the thinking ) and reference structures ( the domain knowledge base , ontology ) .
An application development project, more often than not, has a defined output based on the requirements gathered and the business objectives which have to be derived. Execution of such projects require a hierarchical team structure with different types of work packets being defined and allocated across functions (requirement gathering , design , development , testing , release management , change management ..warranty support) . The output here is often defined , stable and possibly repetitive . So while a manager is the need in a hierarchical setup, as in the latter case does one need one in a flat structure,as in the former ?
Going back to the moot question , “ the industry would require “1.5 million of additional managers and analysts…. “ . Who would constitute these managers?
Let us analyze the work elements that constitute a Data Science project and the team members for a typical project. They would be :
• Data Access ( immediate and unfettered access to data ) – Project sponsor mostly from the business side
• Data Investigation ( defining the business problem and objectives ) – Business Analyst
• Data Preparation (constitutes parsing , cleaning , de/re-normalizing , linking , indexing , interpreting of data ) – Data Quality engineer
• Data Interpretation ( appreciation of the business process and the need to work with subject matter expert for contextual understanding ) – Subject Matter expert
• Data Engineering ( connecting the dots between data sources , construction of quality algorithms , coding ) – Programmer
• Data Analysis ( ability to do statistical sense checking , statistical validity of results ) – Data Scientist • Data Presentation ( communication of the nuanced message as discovered and help bridge the gap between results derived and actions taken ) – Sr Data Scientist/ Manager
So a typical project would have a core team of Data Scientists with a surround team as mentioned. This could well attempt to address the McKinsey’s estimate .
True Blue Data Scientists are outliers and they stick to their ilk :
Top notch Data Scientists are hard to come by . These mavens, most of the time are non conformists , individualists and are uncommonly precise . Can we say they are mavericks ? However, having closely observed them , I can safely conclude that they like to be around others of their ilk and this is more out of a necessity .
The subject is vast , deep and to many arcane . It cuts across domains , statistics and computational science . The inherent complexity brings with it, the need to use the collective mind of the “commune” of data scientists. This could best address the business problems by devising a repertoire of solution approaches through a degree of intense and needful collaboration .
In such a situation where does then a manager fit in? How will the manager provide value? As we have seen the projects are different from normal application development projects and these projects are best run in a nonhierarchical and flat structure.
This field is evolving, the eco-system is becoming more enabling as Data Scientists would be reached out to solve complex business problems across industries ..
If a manager has to play a valuable role, some areas that need to be considered are :
1. Given the “Triangle of Intelligence” , the manager should truly understand the domain and the needs of the business and be the glue to help interact between the business users and the data scientist plus provide the wherewithal to the team to get the access to the input data, the quality of which would be critical to the success of the project .
2. Have the ability to toggle between business jargon and the language of data scientists. Understand the complexity of implementing a solution and optimizing its performance; hence can estimate the time required to implement the solution.
3. Have the experience to disaggregate the complex activities in Data Science projects and help integrate the multiple models ( eg . combination of algorithms ) .
4. Capable of providing the right inputs for innovation that is frequently required in this field.
Then the question is – would Data Science managers grow from within the “commune” of Data Scientists , or could they migrate from the business domain ? We will get the answer as we see more mainstream activities in this area in the coming months .
Managing is about coping with complexity, but leadership is about coping with change ? Given that Data Scientists are already groomed to manage complexity ( and hence could manage themselves ) , do we need then Data Science leaders? May be the need of the hour is to lead rather than manage .
Somjit Amrit Chief Business Officer, Technosoft Corporation