A Data Science Central Community

One of the standard approaches in machine learning is called “instant based learning (IBL)”. K-Nearest neighbors algorithm is perhaps the most known amongst these appraoches. Special cases of BSPs called K-D trees are very often used in real world engineering of K-NN.

Let me get to details:

IBLs require that you identify one or more nearest neighbors to a given point or a record to make a prediction. For example, if I want to figure out whether to classify a transaction as good or fraud, I search for the nearest transaction whose status is known. If it is fraud, I label the original transaction also as fraud or vice versa.

Typically, you compute distances using known attributes. So, for the transaction data, I might look at attributes like demography of the customer who made the transactions, past history of the transactions of the customers, transaction details (amount, ip etc.). All these might add up to several dozens of attributes. If I have around 1,000,000 known transactions, for predicting the status of each new point, I need to measure its distances with 1,000,000 points (each with several dozens of dimension) to find its nearest neighbor.

Clearly, a major disadvantage is the time of computation .

So, people use a variety of techniques to speed up. Some focus on creating a subset of the data.

An interesting technique is to represent the data in the form of a specific type of binary space partitioning tree called K-D tree. Checkout Wikipedia to learn more about it.

KD tree enables me to divide the data in a structured way and quickly remove most points with which I need not compute the distance to find the nearest neighbor as they cannot be nearest neighbors. This reduces the time of computation drastically. But, in very large dimensional problems, BSPs fail.

Of course, decision trees are geometrically space partitioning algorithms too. CART in particular is a binary space partioning tool. However, in regular BSPs we use some geometric criteria (like distance) for partitioning. But, in decision trees, some measures of impurity are minimized while partitioning. As I assume you are not interested in this part, I shall not delve into those methods.

Tags: algorithms, analytics, analyytics, bigdata, bsp, career, data, education, guidance, insofe, More…learning, machine, trees

© 2021 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions