# AnalyticBridge

A Data Science Central Community

k-nearest neighbours is part of supervised learning that has been used in many applications in the field of data mining, statistical pattern recognition, image processing and many others. Some successful
applications are including recognition of handwriting, classifying the nearest
segment for customers etc.

ü  Features

o   All instances correspond to points in an n-dimensional Euclidean space

o   Classification is delayed till a new instance arrives

o   Classification done by comparing feature vectors of the different points

o   Target function may be discrete or real-valued

The k-nearest neighbours algorithm (k-NN) is a method for classifying objects based on closest training examples in the feature space. K-nearest
neighbor is a supervised learning algorithm where the result of new instance
query is classified based on majority of K-nearest neighbor category. The purpose
of this algorithm is to classify a new object based on attributes and training
samples. The classifiers do not use any model to fit and only based on memory.
Given a query point, we find K number of objects or (training points) closest
to the query point. The classification is using majority vote among the
classification of the K objects. Any ties can be broken at random. K Nearest
neighbor algorithm used neighborhood classification as the prediction value of
the new query instance. k is a positive integer, typically small. If k = 1,
then the object is simply assigned to the class of its nearest neighbor.

K nearest neighbor algorithm is very simple. It works based on minimum distance from the query instance to the training samples to determine the K-nearest neighbors. After we gather K nearest neighbors, we take
simple majority of these K-nearest neighbors to be the prediction of the query
instance.

ü  An arbitrary instance is represented by           (a1(x), a2(x), a3(x),.., an(x))

o   ai(x) denotes features

ü  Euclidean distance between two instances

o   d(xi, xj)=sqrt (sum for r=1 to n (ar(xi) - ar(xj))2)

The similar method can be used for regression, by simply assigning the property value for the object to be the average of the values of its k nearest neighbors. It can be useful to weight the contributions of
the neighbors, so that the nearer neighbors contribute more to the average than
the more distant ones. (A common weighting scheme is to give each neighbor a
weight of 1/d, where d is the distance to the neighbor. This
scheme is a generalization of linear interpolation.)

The neighbors are taken from a set of objects for which the correct classification (or, in the case of regression, the value of the property) is known. This can be thought of as the training set for the
algorithm, though no explicit training step is required. The k-nearest
neighbor algorithm is sensitive to the local structure of the data.

Reference:

http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm

KNN can be done using PROC DISCRIM in SAS also:

http://www.sas-programming.com/2010/05/k-nearest-neighbor-in-sas.html
http://www.sascommunity.org/wiki/%3D%3D_How_to_implement_K-Nearest-...

Views: 3171

Comment Comment by JouNa on March 3, 2014 at 1:27pm Comment by Minethedata on January 3, 2011 at 4:56am