Over-fitting.If you perform a regression with 200 predictors (with strong cross-correlations among predictors), use meta regression coefficients: that is, use coefficients of the form f[Corr(Var, Response), a,b, c] where a, b, c are three meta-parameters (e.g. priors in a Bayesian framework). This will reduce your number of parameters from 200 to 3, and eliminate most of the over-fitting
Perform the right type of cross-validation. If your training set has 400,000 observations distributed across 50 clients, and your test data set (used for cross-validation) has 200,000 observations but only 3 clients or 5 days worth of historical data, then your cross-validation methodology is very flawed. Better, split your cross-validation data set in 5 subsets to compute confidence intervals. Do smart sampling.
Messy data. Make sure you've eliminated outliers and cleaned your data set. Use alternate (external) data sets to better control and reconcile data.
Data maintenance. When did you last update this lookup table? Five years ago? Time to do maintenance checks!
Use robust, data-driven procedures. Stay clear of normal distributions and simplistic models such as naive Bayes.
Poor design of experiment. Usually a sampling issue.
Confusing causes and consequences, ignoring hidden variables that indeed explain unexpected correlations (e.g. my age is correlated with oil price - but not causing oil price increase, the real cause is inflation, which is correlated both to age and oil price)