A Data Science Central Community
The data mining competition Topical Classification of Biomedical Research Papers, a special event of Joint Rough Sets Symposium (JRS 2012), has just started. The task is to design most accurate algorithm for multi-label classification of scientific publications in biomedicine. There are 20,000 samples made available for analysis, each comprising 25,640 attributes. Money prizes worth $1,500 will be awarded to the most successful teams. The contest is organized by a research team from University of Warsaw, Poland, and sponsored by Southwest Jiaotong University, China. It is hosted at TunedIT Challenges platform.
Development of freely available biomedical databases allows users to search for documents containing highly specialized biomedical knowledge. Rapidly increasing size of meta-data and text repositories, such as MEDLINE or PubMed Central, emphasizes the growing need for accurate and scalable methods for automatic tagging and classification of textual data. For example, medical doctors often search through biomedical documents for information regarding diagnostics, drugs dosage or possible complications resulting from specific treatments. In the queries, they use highly sophisticated terminology, that can be properly interpreted only with the use of a domain ontology, such as Medical Subject Headings (MeSH). In order to facilitate the search process, documents in a database should be indexed with concepts from the ontology. Additionally, search results could be grouped into clusters of documents that correspond to meaningful topics matching different information needs. Such clusters should not necessarily be disjoint since one document may contain information related to several topics. In the JRS12 Competition, we address both problems, i.e. we are interested in identification of efficient algorithms for topical classification of biomedical research papers based on information about concepts from the MeSH ontology, that were automatically assigned by our tagging algorithm.
This challenge may be appealing to all data mining practitioners due to its strong relations with well-founded subjects: generalized decision rules induction, feature extraction, soft and rough computing, semantic text mining, scalable classification methods. Apart from money prizes for the top teams, authors of selected solutions will be invited to prepare papers for presentation at JRS 2012 special session devoted to the competition and for inclusion in conference proceedings.
Competition web page: http://tunedit.org/challenge/JRS12Contest