A Data Science Central Community
This is the link I referred to when I had the same question.
Hope it hepls!
I'm pretty novice when it comes to these issues, but my understanding is that overfitting is a pretty critical problem to understand.
Overfitting occurs when the noise in your data is modelled as well as the legitimate patterns. So basically, your model includes too much detail and is accounting for random associations which are only there due to sampling issues etc. When modelling a problem, my understanding is that you're looking to model the constant patterns that will occur in all samples, while excluding any random noise. If you have overfitting, then you are modelling this random noise and this will cause a decrease in the predictive power of the model outside the sample.
It is especially a problem with large data sets because you can find spurious associations in them more easily than you can in smaller data sets.
As such, it is crucial to avoid overfitting, which can be diagnosed by always ensuring you use a training and a testing set. If the model performs extremely well on the training set but poorly on the testing set, you know you've overfitted the problem.