Going Over Using K-Nearest Neighbors

A quick debriefing on a really cool supervised learning algorithm

Image Source

K-Nearest Neighbors, or KNN, is a supervised learning algorithm that can be applied on classification and regression problems. KNN is a distance-based classifier, which means it automatically implies that the closer two points are, the more identical they are. Euclidean distance, Minkowski distance, and Manhattan distance are all examples of different distance metrics. Each feature in KNN serves as a dimension. We can conveniently visualize this in a dataset of two columns by considering values for one column as X coordinates and the other as Y coordinates. Because this is a supervised learning algorithm, you’ll need the labels for each point in the dataset. Otherwise, you’ll have little idea what to predict. In contrast to other classifiers, KNN is special in that it does virtually nothing during the fit phase and does much of the work during the predict phase. KNN simply saves all of the training data and labels during the fit phase and no distances are measured at this stage.

The predict stage is where all the fun starts. KNN takes a point at which you want a class prediction and measures the distances between it and every other point in the training set during this process. The algorithm then locates the K nearest points, or neighbors, and analyzes their labels. Each of the K-closest points can be thought of as a vote on the expected class — they will always instinctively vote for members of the same class. The algorithm predicts the point at issue as whatever class has the largest tally of all of the k-nearest neighbors because the majority class wins.

You can use any distance metric with KNN. It is critical to choose a suitable distance metric, which will be determined by the background of the nature of the problem. Whether you’re using the model for a classification or regression task will determine how you measure its results. Averaging the objective scores from each of the K-nearest neighbors can be used to perform regression with KNN, but it can be used as well as for multicategorical and binary classification purposes. KNN classification performance is evaluated in the same way as every other classification algorithm: you need a list of predictions and the accompanying labels for each of the values you predicted. Then you can measure evaluation metrics like precision, recall, accuracy, F1-score, and so on. KNN can be found and imported from scikit-learn using the following code:

from sklearn.neighbors import KNeighborsClassifier

Thank you very much for reading!

LinkedIn

Fitness, Sports, Data — And not necessarily in that order