Consider a set of people's data labelled with two different labels, let's say blue and red, and let's assume that for this people we have a bunch of variables to describe them.

Moreover, let's assume that one of the variables is the social security number (SSN) or whatever univocal ID for each person.

Let me do some considerations:

- If I use the SSN to discriminate the people belonging to the red set from the people belonging to blue set, I can achieve 100% of accuracy because the classifier will not find any overlapping between different people.
- Using the SSN as predictor in a new data set never seen before by the classifier, the results will be catastrophic!
- The entropy of such variable is extremely high, because it is almost a uniform distributed variable!

The key point is: the SSN variable could have a great **I **value but it is dramatically useless to classification job.

Do you have enough about the Theory? I know that ... I did all my best to simplify it (maybe to much...).

I did some tests on the same data set used in this paper by Berkley University:

