A Data Science Central Community

One of the standard approaches in machine learning is called “instant based learning (IBL)”. K-Nearest neighbors algorithm is perhaps the most known amongst these appraoches. Special cases of BSPs called K-D trees are very often used in real world engineering of K-NN.

Let me get to details:

IBLs require that you identify one or more nearest neighbors to a given point or a record to make a prediction. For example, if I want to figure out whether to classify a transaction as good or fraud, I search for the nearest transaction whose status is known. If it is fraud, I label the original transaction also as fraud or vice versa.

Typically, you compute distances using known attributes. So, for the transaction data, I might look at attributes like demography of the customer who made the transactions, past history of the transactions of the customers, transaction details (amount, ip etc.). All these might add up to several dozens of attributes. If I have around 1,000,000 known transactions, for predicting the status of each new point, I need to measure its distances with 1,000,000 points (each with several dozens of dimension) to find its nearest neighbor.

Clearly, a major disadvantage is the time of computation .

So, people use a variety of techniques to speed up. Some focus on creating a subset of the data.

An interesting technique is to represent the data in the form of a specific type of binary space partitioning tree called K-D tree. Checkout Wikipedia to learn more about it.

KD tree enables me to divide the data in a structured way and quickly remove most points with which I need not compute the distance to find the nearest neighbor as they cannot be nearest neighbors. This reduces the time of computation drastically. But, in very large dimensional problems, BSPs fail.

Of course, decision trees are geometrically space partitioning algorithms too. CART in particular is a binary space partioning tool. However, in regular BSPs we use some geometric criteria (like distance) for partitioning. But, in decision trees, some measures of impurity are minimized while partitioning. As I assume you are not interested in this part, I shall not delve into those methods.

Tags:

© 2019 AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

**Technical**

- Free Books and Resources for DSC Members
- Learn Machine Learning Coding Basics in a weekend
- New Machine Learning Cheat Sheet | Old one
- Advanced Machine Learning with Basic Excel
- 12 Algorithms Every Data Scientist Should Know
- Hitchhiker's Guide to Data Science, Machine Learning, R, Python
- Visualizations: Comparing Tableau, SPSS, R, Excel, Matlab, JS, Pyth...
- How to Automatically Determine the Number of Clusters in your Data
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- Fast Combinatorial Feature Selection with New Definition of Predict...
- 10 types of regressions. Which one to use?
- 40 Techniques Used by Data Scientists
- 15 Deep Learning Tutorials
- R: a survival guide to data science with R

**Non Technical**

- Advanced Analytic Platforms - Incumbents Fall - Challengers Rise
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- How to Become a Data Scientist - On your own
- 16 analytic disciplines compared to data science
- Six categories of Data Scientists
- 21 data science systems used by Amazon to operate its business
- 24 Uses of Statistical Modeling
- 33 unusual problems that can be solved with data science
- 22 Differences Between Junior and Senior Data Scientists
- Why You Should be a Data Science Generalist - and How to Become One
- Becoming a Billionaire Data Scientist vs Struggling to Get a $100k Job
- Why do people with no experience want to become data scientists?

**Articles from top bloggers**

- Kirk Borne | Stephanie Glen | Vincent Granville
- Ajit Jaokar | Ronald van Loon | Bernard Marr
- Steve Miller | Bill Schmarzo | Bill Vorhies

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives**: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions