A Data Science Central Community
Or blending data science with the art of search engine optimization (SEO). Here we propose a statistical methodology to increase the amount of organic traffic that a web site receives from Google for specific keywords, leveraging SEO principles to make it a real science, not just an art.
Traditionally, SEO (when implemented by statisticians) is just about A/B, multivariate or Taguchi testing, and other similar schemes sometimes involving fractional factorial designs. Here's my proposal for a high level, generic SEO engine, to find out what drives page rank (that is, whether the page in question is listed in position #1, #2 etc.) on Google search result pages for a specific search keyword:
Gather page rank data for 1,000 high-value keywords (from 3 or 4 different keyword categories) across multiple web pages and web sites
For each webpage and keyword combination, gather the following statistics (broken down per day, over the last 4 weeks), using a web crawler:
Built predictive model (e.g. regression) based on the data/metrics analyzed in step 2.
This is a good project for someone who wants to become a data scientist. The same methodology can be used to predict generic Google page rank or web domain rank. If the page has been updated, it is better to compute the metrics on Google's cache version of the page. All the metrics mentioned above can automatically be computed with a web crawler, using multiple IP addresses from multiple locations (in case Google serves different content based on location), and multiple daily downloads for each page/keyword.