A Data Science Central Community
What is the best way to rank statisticians? When you do find the best of a batch, are they, on average, out of the ordinary?
Kaggle, which conducts pattern-finding competitions among data scientists, has started ranking its top performers. A somewhat less than scientific analysis — talking with them — reveals that their statistically minded outlook does tend to set them apart from the rest of us.
Kaggle’s premise as a company is that watching others succeed at such statistical challenges, like predicting traffic patterns or correctly guessing the social distribution of a Chinese Web site, inspires people to perform better. So making a list of overall champions might seem predictable. The odd part was figuring out a fair distribution.
“We tried a few different algorithms to figure out who they were,” said Jeremy Howard, Kaggle’s president and chief scientist. Kaggle is based on the idea of competitions, he said. “We’ve tried comparing it with a number of sports. The logarithmic functions of Grand Prix. Baseball and basketball have teams, but this is mostly individuals trying for the same outcome. We tried using the same ranking Xbox uses to decide if you will play a really good player. That is quite a complex system.”
The closest corollary Kaggle found is the ranking system used in golf, where there is some team play (some statisticians work together), different kinds of competitions, and a payoff for doing well on a consistent basis. In general, Mr. Howard said, it makes little difference for a top performer if the problem is public health or essays in Arabic. “The argument that great data science is just about letting the data talk holds true.”
The plan is to rank everyone participating in Kaggle contests, based on a rolling average of performance over the preceding 12 months. The top 10 contestants show superficially less diversity than the data sources they work on. They are all male, involved in the sciences, and almost 30 percent Russian, by birth.
There is some range among occupations, and a uniformly high level of brainpower. The current leader, Alexander D’Yakonov, is a professor of computational mathematics and cybernetics at Moscow State University. Sergey Yurgenson, currently ranked in second place, is a physics Ph.D. who designs photon microscopes at the neurobiology department at the Harvard Medical School. Vivek Sharma, a software consultant in financial services based in New Delhi, India, has a master’s degree in computer science.
Read full story and similar articles at http://bits.blogs.nytimes.com/2012/04/08/data-scientists-get-ranked/