Subscribe to DSC Newsletter

Are competitions the future of research?

Originally posted at: http://kaggle.com/blog/2010/05/13/are-competitions-the-future-of-re...

For the past two and a half weeks, I have been hosting a bioinformatics competition related to my research. The competition requires contestants to find markers in the HIV sequence that predict a change in the severity of infection (as measured by viral load). This is a step toward better understanding HIV.

The Predict HIV Progression competition has already attracted 85 submissions from 23 teams. After a quick look at the teams, it seems that we have a pretty even split between bioinformatics, machine learning and HIV researchers. Most pleasing is the degree of collaboration between competitors. So far, there have been 24 contributions to the competition forum. The discussion ranges from complex techniques to a competitor who has posted a software packages to facilitate newcomers.

Even at this early stage, the results have been amazing. The leading submission has already achieved 70.8 per cent accuracy. This is slightly better than the best methods in the current literature, which score 70 per cent on this dataset. (Note that the public leaderboard shows the best entry scoring 66.3 per cent. This is calculated based on just 30 per cent of the test data set to prevent competitors from tuning – or overfiting – their models to fit the answers.)

A few colleagues in my research department and Slashdot readers ask if this is the future of research? I think the answer is yes in certain circumstances. In cases where you have a clear and quantifiable objective, a competition like this one will propel research forward.

Views: 244

Comment

You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

Comment by Anthony Goldbloom on June 11, 2010 at 6:01pm
Thanks for your insights. The question may have been cast too widely - rather are data prediction competitions the future of bioinformatics research.
Comment by Chris Augeri on June 11, 2010 at 12:48pm
There's inherent motivation in a "contest" of any sorts, assuming the prize, be it real ($) or virtual (fame) is of interest to the participants. A contest also provides a natural evaluation measure for consumers (which browser renders faster) and for sponsors (who gets the next round of funding). This approach has been used for many years in various research domains-- some notable examples are listed below. This approach is also used in various government agencies, such as the DARPA / IARPA "bake-off" initiatives, or to a lesser extent in SBIR Phase IIs, to decide which performers to fund in subsequent years.

The tricky part is creating "clear and quantifiable", and meaningful measures. This issue is particularly challenging to create in subjective domains, such as "search result quality", "text summary quality", or "meaningful actionable intelligence". One approach is to use human ensembles, which is really what a crowd-sourced recommender such as Digg is all about, or to develop a reasonable metric where none might exist, such as ROUGE for text summarization. Bottom line, if response quality/quantity can be measured in a useful form, a useful [research question] contest can be created.

One thought, from a pure R&D perspective - ensuring research papers result from the contests...yes, prizes are nice, but what did we learn? Yehuda Koren's post-Netflix prize pub(s) are a good example.

Some contest examples
* IEEE: VAST Challenges
* Netflix Prize
* DARPA Grand Challenge
* SIGKDD: KDD Cup
* NIST: TREC, DUC, MUC, ACE
* XML: compliance @ NIST
* Data compression: speed & size, or combined, compression efficiency.
* and many more...
Comment by Anthony Goldbloom on May 14, 2010 at 2:24am
Tomas, great question. I suggest you ask it on the Kaggle post so William, the competition host, can answer. It good kick off a good discussion.
Comment by Tomas Keller (formerly Ohlson) on May 14, 2010 at 2:11am
Good to hear that you receive so many submissions. Are you planning to build a meta predictor / ensemble method from the best predictors? The meta predictors were usually the best methods when I was working on protein structure prediction.


Tomas

On Data Science Central

© 2020   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service