Subscribe to DSC Newsletter

Outlier analysis: Chebyschev criteria vs approach based on Mutual Information

As often happens, I usually do many thing in the same time, so during a break while I was working for a new post on applications of mutual information in data mining, I read the interesting paper suggested by Sandro Saitta on his blog (dataminingblog)  related to the outlier detection. 

...Usually such behavior is not proficient to obtain good results, but this time I think that the change of prospective has been positive!

click here to read the entire post

Approach based on Mutual Information
Before to explain my approach I have to say that I have not had time to check in literature if this method has been already implemented (please drop a comment if someone find out a reference! ... I don't want take improperly credits).
The aim of the method is to remove iteratively the sorted Z-Scores till the mutual information between the Z-Scores and the candidates outlier I(Z|outlier) increases.
At each step the candidate outlier is the Z-score having the highest absolute value.

Basically, respect the Chebyschev method, there is no pre-fixed threshold.

click here to read the entire post

some comparative results:

Views: 1659


You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

On Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service