A Data Science Central Community
This technique does not exploit the original data used to produce the model, but just the predicted and observed values, and nothing else. It was initially designed in the context of time series, to improve daily weather forecasts or daily stock trading signals.
The enhanced model in the chart below is an example of improvement (higher ROI) obtained in the context of trading strategies:
Here's how it works:
A good time series model produces estimates where the error between observed value and forecast is essentially a white noise process, with very little if any auto-correlations. If some strong auto-correlation or other dependence patterns are found in the time series of residual errors, then the predictive model can be enhanced.
For simplicity, let's use daily weather forecasts. Assume that the forecast (for a specific location) can take any of the following values: Sunny (S), Cloudy (C), Rain (R), Other (O). Let's define a path as any sequence of consecutive daily forecasts.
The length of a path its the number of days. For instance S->R->R->R is path of length 4. If you check all sequences S->R->R->R and find that on average, the last prediction in that path is more often wrong than right, and C (Cloud) would a better predictor than R (Rain) for the most recent day, then the enhanced model simply consists in replacing the last R prediction from the base model by a C prediction in all S->R->R->R paths. In short, Enhanced(S->R->R->R) = S->R->R->C.
Apply the same strategy for all paths of length 4 (paths with enough occurrences) and you get a better predictive model.
Binning the predicted values will significantly increase the number of repeating sequences, and will reduce the total number of unique sequences. Extreme binning, resulting in binary forecasts, will produce substantial reduction in the number of different paths: indeed, you will have no more than 30 different paths of length less than or equal to 4. If you have 600 days worth of forecasts and only 30 possible paths, path redundancy will be huge - each path having on average 20 clones.
Over-fitting, in this context, has nothing to do with underlying data: it has to do with enhancing paths where number of occurrences is too small to have statistical significance.
Related keywords: Runs, time series, markov chains, residual error, auto-correlations