Subscribe to DSC Newsletter

I have come across quite a number of papers/articles that list the top 10 modeling mistakes. Would love to know what are the biggest mistakes according to each one of you. Please share.



Tags: Top 10 modeling mistakes

Views: 938

Reply to This

Replies to This Discussion

Guess nobody will contribute unless i start :)

Below is my list:

1. Not understanding the business problem and/or vague modeling objective.

2. Target variable definition (very tricky in churn modeling)

3. Lack of relevant data

4. Improper model validation

5. Using just one technique (assuming that some technqiues are better than others)

6. Applying a single approach to outliers (treat or discard every time an outlier is detected)

7. Building a model for understanding drivers and using it for prediction, and vice versa.

8. Variable selection - i. based on assumptions (lack of a proper detailed EDA) ii. based on maths alone

9. There are no best models, only useful ones. 

10. This applies to forecasting models only - using less data to predict for longer periods (e.g. use 1 year history to forecast sales for the next 2 years)

Want to add few more critical mistakes, which ranks high in my opinion.

1) No action plan on the modeling results - the analysis is drived only by business problem or some objectives but no action planned on the results.

2) No or indefinite success criteria defined for the modeling results.

3) Assumption that model has very short or very long stability to predict future scenarios

In my view it is due to wrong perception of what a model is. Take the definition, study their feasibility with the assumptions required and define the problem first. I include a small listing of possible situations, which can be used to add, delete and modify to understand the whole logic behind models.



            Whatever be the form of research, the whole process starts with a cognition of the problem in a manageable way. This cognition is seen to be best achieved by using a model. Lot of philosophical considerations have taken place before the current scenario evolved. Model is a term defying a definition. However an idea of the models and knowledge about them are very much important in more than one sense. While it paves a convenient platform for research, it also provides knowledge about the possible types of studies.


Model is a term to mean representation. A research problem may be visualised as a system involving some components with specific properties and capabilities. Explaining such a system in simple terms may be very difficult. A collection of ideas relating to “Model” will be highly useful. This is attempted by making an effort to list all possible types of models and their requirements.


Models can be classified in different ways.


Models by Type:

            There are three types of models

  1. 1.      Iconic models – the system is explained in terms of physical representation using a scaled up or scaled down principle.
  2. 2.      Analog models – a complicated system is explained in terms of a simplified system that has an equivalent behaviour  as the basic system
  3. 3.      Symbolic Models – the system is explained using symbols to represent the components. There are two possibilities here.
    1. i.                    Verbal Models – involving a simple verbal explanation using a story or construct.
    2. ii.                  Abstract Models – involving symbols and equations as in mathematics.


Models by Purpose:

            Again models can be classified into three based on their nature. They are

  1. 1.      Descriptive Models – the representation is restricted only to the description.
  2. 2.      Predictive Models – have the capability to predict the behaviour of a concept in terms of other cognizable concepts.
  3. 3.      Normative Models – which are generated to optimize some objective within allowed roles for the related concepts.


Models by Nature:

            Models can be classified into two depending upon the nature of the associated components.

  1. 1.      Deterministic Models – involving components whose behaviour is completely known.
  2. 2.      Stochastic Models – involving components, some of them with a behaviour that could be accounted for using rules of probability.


Models by Time Factor:

            Models are also generally classified into two in terms of the use of time factor as a component or not.

  1. 1.      Static models – consider situations with out variation in time factor.
  2. 2.      Dynamic Models – are those with ‘time’ being considered either explicitly or implicitly in their construction.


Models by Method of Solution:

            Models can also be classified according to the nature of solution. There are models whose solution might be obtained using analytical methods. Also, there are models which defy such solutions.

  1. 1.      Analytical Models – which subscribe to analytical solutions
  2. 2.      Simulation Models – Where solution can be obtained using simulation techniques.


            These models are to be taken not as separate classifications but as possible ways in which a model may have to be comprehended. For instance, there may an abstract model which is dynamic and generated for predictive purposes and containing stochastic components giving possibilities for analytical solutions. The point to be noted is that the generic meaning of “Model” may not be enough to contain all information associated with the problem of study.

The TOP 10 MODELING / DATA MINING MISTAKES were first presented by Dr. John Elder IV many years ago, and then printed as CHAPTER 22 or 23 or 24 in our book:  HANDBOOK OF STATISTICAL ANALYSIS & DATA MINING APPLICATIONS, Nisbet, Elder, and Miner, 2009; ELSEVIER Publisher ....Available either from Elsevier or  Amazon [Amazon was out of stock last week, but "back in stock" as of yesterday with a new low price.  The printed copy of the book has a DVD with a free 90 day trial of data mining software, also, in case interested.  SAS has reprinted this 10 MISTAKES CHAPTER, and gives it away at their exhibit for "Data Mining / Predictive Analytics" conferences;  They will most likely have it available at the upcoming MARCH 14-15, 2011 PREDICTIVE ANALYTICS WORLD meeting in San Francisco.


John Elder presents these 10 mistakes in his DATA MINING / TEXT MINING WORKSHOPS, which are held several times a year, so I suspect this is where the "citations" you are seeing have come from.


The list you give in your "first reply" to start the discussion is essentially John's list .........with a few variations in wording in some of the 10 items.


Hope this helps .....

Top modeling misakes:

  1. Lack of communication with the model customer
  2. Lack of understanding the problem set and problem objective
  3. Lack of peer review
  4. Lack of accountability
I have attached an article(DM News in Canada) that I have written which  looks at the top ten tips in creating successful predictive analytics solutions. This is a  3 part series with the first article touching on the first three tips   

My addition to an already impressive list


Failure to define the baseline or AS_IS situation versus which the model recommendations are going to be compared against. This refers to defining the baseline both from a STATS and Business point of view..




On Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service