A Data Science Central Community
k-nearest neighbours is part of supervised learning that has been used in many applications in the field of data mining, statistical pattern recognition, image processing and many others. Some successful
applications are including recognition of handwriting, classifying the nearest
segment for customers etc.
ü Features
o All instances correspond to points in an n-dimensional Euclidean space
o Classification is delayed till a new instance arrives
o Classification done by comparing feature vectors of the different points
o Target function may be discrete or real-valued
The k-nearest neighbours algorithm (k-NN) is a method for classifying objects based on closest training examples in the feature space. K-nearest
neighbor is a supervised learning algorithm where the result of new instance
query is classified based on majority of K-nearest neighbor category. The purpose
of this algorithm is to classify a new object based on attributes and training
samples. The classifiers do not use any model to fit and only based on memory.
Given a query point, we find K number of objects or (training points) closest
to the query point. The classification is using majority vote among the
classification of the K objects. Any ties can be broken at random. K Nearest
neighbor algorithm used neighborhood classification as the prediction value of
the new query instance. k is a positive integer, typically small. If k = 1,
then the object is simply assigned to the class of its nearest neighbor.
K nearest neighbor algorithm is very simple. It works based on minimum distance from the query instance to the training samples to determine the K-nearest neighbors. After we gather K nearest neighbors, we take
simple majority of these K-nearest neighbors to be the prediction of the query
instance.
ü An arbitrary instance is represented by (a_{1}(x), a_{2}(x), a_{3}(x),.., a_{n}(x))
o a_{i}(x) denotes features
ü Euclidean distance between two instances
o d(x_{i}, x_{j})=sqrt (sum for r=1 to n (a_{r}(x_{i}) - a_{r}(x_{j}))^{2})
The similar method can be used for regression, by simply assigning the property value for the object to be the average of the values of its k nearest neighbors. It can be useful to weight the contributions of
the neighbors, so that the nearer neighbors contribute more to the average than
the more distant ones. (A common weighting scheme is to give each neighbor a
weight of 1/d, where d is the distance to the neighbor. This
scheme is a generalization of linear interpolation.)
The neighbors are taken from a set of objects for which the correct classification (or, in the case of regression, the value of the property) is known. This can be thought of as the training set for the
algorithm, though no explicit training step is required. The k-nearest
neighbor algorithm is sensitive to the local structure of the data.
Reference:
http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
KNN can be done using PROC DISCRIM in SAS also:
Comment
hi what you think if combined K Means and KNN ?
© 2019 AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of AnalyticBridge to add comments!
Join AnalyticBridge