Subscribe to DSC Newsletter

does anyone know what does overfitting mean?
I read about CPAR (Classification based on Predictive Association Rules) compared to C4.5 algorithm and it says: "the C4.5 has two major deficiencies, (1) it generates a very large number of association rules, which leads to high processing overhead, and (2) its confidence-based rule evaluation measure may lead to overfitting"

Views: 302

Reply to This

Replies to This Discussion


This is the link I referred to when I had the same question.

Hope it hepls!





I'm pretty novice when it comes to these issues, but my understanding is that overfitting is a pretty critical problem to understand.


Overfitting occurs when the noise in your data is modelled as well as the legitimate patterns. So basically, your model includes too much detail and is accounting for random associations which are only there due to sampling issues etc. When modelling a problem, my understanding is that you're looking to model the constant patterns that will occur in all samples, while excluding any random noise. If you have overfitting, then you are modelling this random noise and this will cause a decrease in the predictive power of the model outside the sample.


It is especially a problem with large data sets because you can find spurious associations in them more easily than you can in smaller data sets.


As such, it is crucial to avoid overfitting, which can be diagnosed by always ensuring you use a training and a testing set. If the model performs extremely well on the training set but poorly on the testing set, you know you've overfitted the problem.


On Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service