Subscribe to DSC Newsletter

Tutorial: Neutralizing Outliers in Any Dimension

The main focus of this article is on computing the point that minimizes the sum of the "distances" to n points in a d-dimensional space, called centroid or center, in the presence of outliers. 

This long article has several sections.

Content

1. A related physics problem

2. Algorithm to find the centroid

  • Source code to generate points and compute centroid, using Monte Carlo
  • Generating point clouds with simulation

3. Examples and results

4. Convergence of the algorithm

5. Interesting Contour Maps

A lot of material is presented in this article, and chances are that you will find something interesting for you, for instance about

  • Several outlier detection techniques
  • How to display contour maps and images corresponding to an intensity function or heatmap, in R (in just a few lines of code, and very easy to understand)
  • How to produce data sets that simulate clustering structures or other patterns
  • Distribution of arrival times for successive records in a time series
  • Convergence of Monte Carlo optimization

The picture below is from the article. 

Click here to read the article. 

Top DSC Resources

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Views: 332

Comment

You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

On Data Science Central

© 2020   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service