A Data Science Central Community
One of the standard approaches in machine learning is called “instant based learning (IBL)”. K-Nearest neighbors algorithm is perhaps the most known amongst these appraoches. Special cases of BSPs called K-D trees are very often used in real world engineering of K-NN.
Let me get to details:
IBLs require that you identify one or more nearest neighbors to a given point or a record to make a prediction. For example, if I want to figure out whether to classify a transaction as good or fraud, I search for the nearest transaction whose status is known. If it is fraud, I label the original transaction also as fraud or vice versa.
Typically, you compute distances using known attributes. So, for the transaction data, I might look at attributes like demography of the customer who made the transactions, past history of the transactions of the customers, transaction details (amount, ip etc.). All these might add up to several dozens of attributes. If I have around 1,000,000 known transactions, for predicting the status of each new point, I need to measure its distances with 1,000,000 points (each with several dozens of dimension) to find its nearest neighbor.
Clearly, a major disadvantage is the time of computation .
So, people use a variety of techniques to speed up. Some focus on creating a subset of the data.
An interesting technique is to represent the data in the form of a specific type of binary space partitioning tree called K-D tree. Checkout Wikipedia to learn more about it.
KD tree enables me to divide the data in a structured way and quickly remove most points with which I need not compute the distance to find the nearest neighbor as they cannot be nearest neighbors. This reduces the time of computation drastically. But, in very large dimensional problems, BSPs fail.
Of course, decision trees are geometrically space partitioning algorithms too. CART in particular is a binary space partioning tool. However, in regular BSPs we use some geometric criteria (like distance) for partitioning. But, in decision trees, some measures of impurity are minimized while partitioning. As I assume you are not interested in this part, I shall not delve into those methods.