According to the World Health Organization, HIV has caused 25 millions deaths worldwide since it was first recognized in 1981. In recent years, the infection has been managed with a collection of therapies. However, the virus will likely evolve around these drugs, making it crucially important that we get a better understanding of the virus itself.
An important step in understanding the virus is to get a handle on its genetic blueprint. William Dampier of Drexler University is hosting a competition aims to do this by having contestants find markers in the HIV sequence which predict a change in the severity of the infection (as measured by viral load and CD4 counts).
Models can be trained using the records of 1,000 patients. To predict an improvement in a patient's viral load, competitors will be provided with data on the nucleotide sequences of patient's Reverse Transcriptase (RT), their Protease (PR) and their viral load and CD4 count at the beginning of therapy. There is a brief discussion of the science of these variables on the site, but no knowledge of biology is necessary to succeed in this competition. Competitors' predictions will be tested on a dataset containing 692 patients.
There is $USD500 up for grabs, and the winner(s) will also have the opportunity to co-author a paper with the competition host.
The competition can be found at http://kaggle.com/hivprogression