Subscribe to DSC Newsletter

The 2010 Competition is being called the "Grand Champion" (see ) as it looks at many different frequencies of data instead of just one. It features hourly, daily, monthly, quarterly, and annual problems. Results will come out in a few months....

The Hourly data was passenger traffic on the Paris Metro. There are 3 confounding issues with the data. There were 20 hours of data as the station is shut at night for four hours so it was not a complete 24 hours of data, but more difficult is the fact that the weekend has a different pattern of ridership during the day then during the weekdays.

So, not only do you need to model the days of the week(6 variables), the hours of the day(19 variables), but the interaction(6*19= 114 variables) between the two as they differ depending on the day. Note that not all interactions would be significant as they may be at the average.

You can't forget about adjusting for outliers during all of this so you have an accurate estimate of the model!

Views: 53

Replies to This Discussion

The competition started on January, the first stage of paper submission is passed by now


On Data Science Central

© 2020   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service