KNN Algorithm – K-Nearest Neighbors Classifiers and Model Example (2024)

The K-Nearest Neighbors (K-NN) algorithm is a popular Machine Learning algorithm used mostly for solving classification problems.

In this article, you'll learn how the K-NN algorithm works with practical examples.

We'll use diagrams, as well sample data to show how you can classify data using the K-NN algorithm. We'll also discuss the advantages and disadvantages of using the algorithm.

How Does the K-Nearest Neighbors Algorithm Work?

The K-NN algorithm compares a new data entry to the values in a given data set (with different classes or categories).

Based on its closeness or similarities in a given range (K) of neighbors, the algorithm assigns the new data to a class or category in the data set (training data).

Let's break that down into steps:

Step #1 - Assign a value to K.

Step #2 - Calculate the distance between the new data entry and all other existing data entries (you'll learn how to do this shortly). Arrange them in ascending order.

Step #3 - Find the K nearest neighbors to the new entry based on the calculated distances.

Step #4 - Assign the new data entry to the majority class in the nearest neighbors.

Don't worry if the steps above seem confusing at the moment. The examples in the sections that follow will help you understand better.

K-Nearest Neighbors Classifiers and Model Example With Diagrams

With the aid of diagrams, this section will help you understand the steps listed in the previous section.

Consider the diagram below:

KNN Algorithm – K-Nearest Neighbors Classifiers and Model Example (1)

The graph above represents a data set consisting of two classes — red and blue.

KNN Algorithm – K-Nearest Neighbors Classifiers and Model Example (2)

A new data entry has been introduced to the data set. This is represented by the green point in the graph above.

We'll then assign a value to K which denotes the number of neighbors to consider before classifying the new data entry. Let's assume the value of K is 3.

KNN Algorithm – K-Nearest Neighbors Classifiers and Model Example (3)

Since the value of K is 3, the algorithm will only consider the 3 nearest neighbors to the green point (new entry). This is represented in the graph above.

Out of the 3 nearest neighbors in the diagram above, the majority class is red so the new entry will be assigned to that class.

KNN Algorithm – K-Nearest Neighbors Classifiers and Model Example (4)

The last data entry has been classified as red.

K-Nearest Neighbors Classifiers and Model Example With Data Set

In the last section, we saw an example the K-NN algorithm using diagrams. But we didn't discuss how to know the distance between the new entry and other values in the data set.

In this section, we'll dive a bit deeper. Along with the steps followed in the last section, you'll learn how to calculate the distance between a new entry and other existing values using the Euclidean distance formula.

Note that you can also calculate the distance using the Manhattan and Minkowski distance formulas.

Let's get started!

BrightnessSaturationClass
4020Red
5050Blue
6090Blue
1025Red
7070Blue
6010Red
2580Blue

The table above represents our data set. We have two columns — Brightness and Saturation. Each row in the table has a class of either Red or Blue.

Before we introduce a new data entry, let's assume the value of K is 5.

How to Calculate Euclidean Distance in the K-Nearest Neighbors Algorithm

Here's the new data entry:

BrightnessSaturationClass
2035?

We have a new entry but it doesn't have a class yet. To know its class, we have to calculate the distance from the new entry to other entries in the data set using the Euclidean distance formula.

Here's the formula: √(X₂-X₁)²+(Y₂-Y₁)²

Where:

  • X₂ = New entry's brightness (20).
  • X₁= Existing entry's brightness.
  • Y₂ = New entry's saturation (35).
  • Y₁ = Existing entry's saturation.

Let's do the calculation together. I'll calculate the first three.

Distance #1

For the first row, d1:

BrightnessSaturationClass
4020Red

d1 = √(20 - 40)² + (35 - 20)²
= √400 + 225
= √625
= 25

We now know the distance from the new data entry to the first entry in the table. Let's update the table.

BrightnessSaturationClassDistance
4020Red25
5050Blue?
6090Blue?
1025Red?
7070Blue?
6010Red?
2580Blue?
Distance #2

For the second row, d2:

BrightnessSaturationClassDistance
5050Blue?

d2 = √(20 - 50)² + (35 - 50)²
= √900 + 225
= √1125
= 33.54

Here's the table with the updated distance:

BrightnessSaturationClassDistance
4020Red25
5050Blue33.54
6090Blue?
1025Red?
7070Blue?
6010Red?
2580Blue?
Distance #3

For the third row, d3:

BrightnessSaturationClassDistance
6090Blue?

d2 = √(20 - 60)² + (35 - 90)²
= √1600 + 3025
= √4625
= 68.01

Updated table:

BrightnessSaturationClassDistance
4020Red25
5050Blue33.54
6090Blue68.01
1025Red?
7070Blue?
6010Red?
2580Blue?

At this point, you should understand how the calculation works. Attempt to calculate the distance for the last four rows.

Here's what the table will look like after all the distances have been calculated:

BrightnessSaturationClassDistance
4020Red25
5050Blue33.54
6090Blue68.01
1025Red10
7070Blue61.03
6010Red47.17
2580Blue45

Let's rearrange the distances in ascending order:

BrightnessSaturationClassDistance
1025Red10
4020Red25
5050Blue33.54
2580Blue45
6010Red47.17
7070Blue61.03
6090Blue68.01

Since we chose 5 as the value of K, we'll only consider the first five rows. That is:

BrightnessSaturationClassDistance
1025Red10
4020Red25
5050Blue33.54
2580Blue45
6010Red47.17

As you can see above, the majority class within the 5 nearest neighbors to the new entry is Red. Therefore, we'll classify the new entry as Red.

Here's the updated table:

BrightnessSaturationClass
4020Red
5050Blue
6090Blue
1025Red
7070Blue
6010Red
2580Blue
2035Red

How to Choose the Value of K in the K-NN Algorithm

There is no particular way of choosing the value K, but here are some common conventions to keep in mind:

  • Choosing a very low value will most likely lead to inaccurate predictions.
  • The commonly used value of K is 5.
  • Always use an odd number as the value of K.

Advantages of K-NN Algorithm

  • It is simple to implement.
  • No training is required before classification.

Disadvantages of K-NN Algorithm

  • Can be cost-intensive when working with a large data set.
  • A lot of memory is required for processing large data sets.
  • Choosing the right value of K can be tricky.

Summary

In this article, we talked about the K-Nearest Neighbors algorithm. It is often used for classification problems.

We saw an example using diagrams to explain how the algorithms works.

We also saw an example using sample data to see the steps involved in classifying a new data entry.

Lastly, we discussed the advantages and disadvantages of the algorithm, and how you can choose the value of K.

Happy coding!

KNN Algorithm – K-Nearest Neighbors Classifiers and Model Example (2024)

FAQs

KNN Algorithm – K-Nearest Neighbors Classifiers and Model Example? ›

KNN algorithm at the training phase just stores the dataset and when it gets new data, then it classifies that data into a category that is much similar to the new data. Example: Suppose, we have an image of a creature that looks similar to cat and dog, but we want to know either it is a cat or dog.

What is the KNN classifier for nearest neighbor? ›

The K-Nearest Neighbors (KNN) algorithm is a popular machine learning technique used for classification and regression tasks. It relies on the idea that similar data points tend to have similar labels or values. During the training phase, the KNN algorithm stores the entire training dataset as a reference.

What is the difference between KNN and nearest neighbors? ›

In short, KNN involves classifying a data point by looking at the nearest annotated data point, also known as the nearest neighbor. Don't confuse K-NN classification with K-means clustering. KNN is a supervised classification algorithm that classifies new data points based on the nearest data points.

What is the KNN algorithm equation? ›

The k-NN algorithm

Formally Sx is defined as Sx⊆D s.t. |Sx|=k and ∀(x′,y′)∈D∖Sx, dist(x,x′)≥max(x″,y″)∈Sxdist(x,x″), (i.e. every point in D but not in Sx is at least as far away from x as the furthest point in Sx).

What is KNN in machine learning with an example? ›

The abbreviation KNN stands for “K-Nearest Neighbour”. It is a supervised machine learning algorithm. The algorithm can be used to solve both classification and regression problem statements. The number of nearest neighbours to a new unknown variable that has to be predicted or classified is denoted by the symbol 'K'.

What is k-nearest neighbor in simple terms? ›

What is KNN? K Nearest Neighbour is a simple algorithm that stores all the available cases and classifies the new data or case based on a similarity measure. It is mostly used to classifies a data point based on how its neighbours are classified.

Why is KNN called lazy learner? ›

K-NN is a non-parametric algorithm, which means that it does not make any assumptions about the underlying data. It is also called a lazy learner algorithm because it does not learn from the training set immediately instead it stores the data set and at the time of classification it performs an action on the data set.

How to do nearest neighbor algorithm? ›

These are the steps of the algorithm:
  1. Initialize all vertices as unvisited.
  2. Select an arbitrary vertex, set it as the current vertex u. ...
  3. Find out the shortest edge connecting the current vertex u and an unvisited vertex v.
  4. Set v as the current vertex u. ...
  5. If all the vertices in the domain are visited, then terminate.

How to choose k in k-nearest neighbor? ›

The choice of k will largely depend on the input data as data with more outliers or noise will likely perform better with higher values of k. Overall, it is recommended to have an odd number for k to avoid ties in classification, and cross-validation tactics can help you choose the optimal k for your dataset.

What is the basic principle of KNN? ›

At its heart, KNN uses different sorts of distance metrics to evaluate the proximity of two data points (their similarity). A core assumption of KNN is: The closer two given points are to each other, the more related and similar they are. Typically used with data transmitted over computer networks.

How to use KNN to predict? ›

How Does the KNN Algorithm Work?
  1. First, the distance between the new point and each training point is calculated.
  2. The closest k data points are selected (based on the distance). ...
  3. The average of these data points is the final prediction for the new point.

What are the problems with K nearest neighbors? ›

KNN has some drawbacks and challenges, such as computational expense, slow speed, memory and storage issues for large datasets, sensitivity to the choice of k and the distance metric, and susceptibility to the curse of dimensionality.

What if k is 1 in KNN? ›

If k = 1, then the object is simply assigned to the class of that single nearest neighbor. In k-NN regression, the output is the property value for the object. This value is the average of the values of k nearest neighbors. If k = 1, then the output is simply assigned to the value of that single nearest neighbor.

Why do we use the KNN algorithm? ›

KNN is one of the simplest forms of machine learning algorithms mostly used for classification. It classifies the data point on how its neighbor is classified. KNN classifies the new data points based on the similarity measure of the earlier stored data points.

What is KNN nearest neighbor regression? ›

In the context of regression, KNN is often referred to as “K-Nearest Neighbors Regression” or “KNN Regression.” It's a simple and intuitive algorithm that makes predictions by finding the K nearest data points to a given input and averaging their target values (for numerical regression) or selecting the majority class ...

What is k in the k nearest neighbors algorithm? ›

The k value in the k-NN algorithm defines how many neighbors will be checked to determine the classification of a specific query point. For example, if k=1, the instance will be assigned to the same class as its single nearest neighbor.

What is the KNN nearest neighbor graph? ›

The k-Nearest Neighbor Graph (k-NNG) [57] is defined as a graph G=(V,E), where E={(u,v,δ(u,v))|v∈NNk(u)δ} such that NNk(u)δ is the set containing the k-nearest neighbors of u in the set of vertices V using the similarity function δ.

What is the 1 nearest neighbor classifier? ›

The 1-N-N classifier is one of the oldest methods known. The idea is ex- tremely simple: to classify X find its closest neighbor among the training points (call it X ,) and assign to X the label of X .

References

Top Articles
Latest Posts
Article information

Author: Kieth Sipes

Last Updated:

Views: 5759

Rating: 4.7 / 5 (67 voted)

Reviews: 90% of readers found this page helpful

Author information

Name: Kieth Sipes

Birthday: 2001-04-14

Address: Suite 492 62479 Champlin Loop, South Catrice, MS 57271

Phone: +9663362133320

Job: District Sales Analyst

Hobby: Digital arts, Dance, Ghost hunting, Worldbuilding, Kayaking, Table tennis, 3D printing

Introduction: My name is Kieth Sipes, I am a zany, rich, courageous, powerful, faithful, jolly, excited person who loves writing and wants to share my knowledge and understanding with you.