Subscribe to DSC Newsletter

Twelve reasons why analytics fails to deliver its promises


  • The model is more accurate than the data: you try to kill a fly with a nuclear weapon.
  • You spent one month designing a perfect solution when a 95% accurate solution can be designed in one day. You focus on 1% of the business revenue, lack vision / lack the big picture.
  • Poor communication. Not listening to the client, not requesting the proper data. Providing too much data to decision makers rather than 4 bullets with actionable information. Failure to leverage external data sources. Not using the right metrics. Problems with gathering requirements. Poor graphics, or graphics that are too complicated.
  • Failure to remove or aggregate conclusions that do not have statistical significance. Or repeating many times the same statistical tests, thus negatively impacting the confidence levels.
  • Sloppy modeling or design of experiments, or sampling issues. One large bucket of data has good statistical significance, but all the data in the bucket in question is from one old client with inaccurate statistics. Or you use two databases to join sales and revenue, but the join is messy, or sales and revenue data do not overlap because of a different latency.
  • Lack of maintenance. The data flow is highly dynamic and patterns change over time, but the model was tested 1 year ago on a data set that has significantly evolved. Model is never revisited, or parameters / blacklists are not updated with the right frequency.
  • Changes in definition (e.g. include international users in the definition of a user, or remove filtered users) resulting in metrics that lack consistency, making vertical comparisons (trending, for a same client) or horizontal comparisons (comparing multiple clients at a same time) impossible.
  • Blending data from multiple sources without proper standardizations: using (non-normalized) conversion rates instead of (normalized) odds of conversion.
  • Poor cross validation. Cross validation should not be about randomly splitting the training set into a number of subsets, but rather comparing before (training) with after (test). Or comparing 50 training clients with 50 different test clients, rather than 5000 training observations from100 clients with another 5000 test observations from the same 100 clients. Eliminate features with statistical significance but lack of robustness when comparing 2 time periods.
  • Improper use of statistical packages. Don't feed a decision tree software with a raw metric such as IP address: it just does not make sense. Instead provide a smart binned metric such as type of IP address (corporate proxy, bot, anonymous proxy, edu proxy, static IP, IP from ISP, etc.)
  • Wrong assumptions. Working with dependent independent variables and not handling the problem. Violations of the Gaussian model, multimodality ignored. External factor explains the variations in response, not your independent variables. When doing A/B testing, ignoring important changes made to the website during the A/B testing time period.
  • Lack of good sense. Analytics is a science AND an art, and the best solutions require sophisticated craftsmanship (the stuff you will never learn at school), but might usually be implemented pretty fast: elegant/efficient simplicity vs. inefficient complicated solutions.

Views: 260

Comment

You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

Comment by Jozo Kovac on February 8, 2010 at 1:50pm
My favorite: "spent one month designing a perfect solution when a 95% accurate solution can be designed in one day. lack vision"
Comment by DataLLigence on December 29, 2009 at 9:03pm
hi vincent, i'm gonna put this up on my blog, along with a few additions! thanks.

On Data Science Central

© 2019   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service