Implementation Of KNN(using Scikit learn)

Implementation Of KNN(using Scikit learn,numpy and pandas)

Implementation Of KNN(using Scikit learn)
Implementation Of KNN(using Scikit learn)


KNN classifier is one of the strongest but easily implementable supervised machine learning algorithm. It can  be used for both classification and regression problems. I
f we try to implement KNN from scratch it becomes a bit tricky however, there are some libraries like sklearn in python, that allows a programmer to make KNN model easily without using deep ideas of mathematics. 


      We are going to classify the iris data into its different species by observing different 4 features: sepal length, sepal width, petal length, petal width. We have all together 150 observations(tuples) and we will make KNN classifying model on the basis of these observations.Link to download iris dataset- iris.csv
 
Let’s see step-by-step how to implement KNN using scikit learn(sklearn).  

Step-1: First of all we load/import our training data set either from computer hard disk or from any url.
import pandas as pd# loading data file into the program. give the location of your csv file
dataset = pd.read_csv(“E:/input/iris.csv”)
print(dataset.head()) # prints first five tuples of your data.


Step-2: Now, we split data row wise into attribute/features and their corresponding labels.

 

X = dataset.iloc[:, :-1].values # splits the data and make separate array X to hold attributes.
y = dataset.iloc[:
, 4].values  # splits the data and make separate array y to hold corresponding labels.


Step-3: In this step, we divide our entire dataset into two subset. one of them is used for training our model and the remaining one for testing the model. we divide our data into 80:20 i.e. first 80% of total data is training data and remaining 20% is our test data. We divide both attributes and labels. We do this type of division to measure the accuracy of our model.  This process of spiting our supplied dataset into training and testing subsets in order to know the accuracy and performance of our model is called cross-validation.

 

from sklearn.model_selection import train_test_split
X_train
, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)


Step-4: In this step, we perform normalization/standardization. It is process of re-scaling our data, so that the variations present in our data will not affect the accuracy of model. we have used z-score normalization technique here. For more on normalization, click here.

 

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)


Step-5: Now its time to define our KNN model.We make a model,and supply attributes of test subset for the prediction.

 

from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(
n_neighbors=9) #defining KNN classifier for k=9.
classifier.fit(X_train
, y_train) #learning process i.e. supplying training data to model
y_pred = classifier.predict(X_test)
#stores prediction result in y_pred


Step-6: Since the test data we’ve supplied to the mdel is a portion of training data, so we have the actual labels for them. In this step we find the magnitudes of some classification metrices like precision, recall, f1-score etc. 

from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))


Step-7: supply actual test data to the model. 

# testing model by suppplying ramdom data
x_random =  [[-1.56697667 , 1.22358774, 1.56980273, 1.33046652],
[-2.21742620 , 3.08669365, 1.29593102,1.07025858]]
y_random=(classifier.predict(x_random))
print(y_random)

 
Let’s see the output of above program.

   sepal.length  sepal.width  petal.length  petal.width variety
0           5.1          3.5           1.4          0.2  Setosa
1           4.9          3.0           1.4          0.2  Setosa
2           4.7          3.2           1.3          0.2  Setosa
3           4.6          3.1           1.5          0.2  Setosa
4           5.0          3.6           1.4          0.2  Setosa
[[11  0  0]
 [ 0  9  0]
 [ 0  0 10]]


Classification metrices for test data:
              precision    recall  f1-score   support

      Setosa       1.00      1.00      1.00        11
  Versicolor       1.00      1.00      1.00         9
   Virginica       1.00      1.00      1.00        10

   micro avg       1.00      1.00      1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30

For actual test data:

[‘Setosa’ ‘Setosa’]

For Implementation of KNN from scratch using python, click here.

One thought on “Implementation Of KNN(using Scikit learn)

Leave a Reply

Insert math as
Block
Inline
Additional settings
Formula color
Text color
#333333
Type math using LaTeX
Preview
\({}\)
Nothing to preview
Insert
%d bloggers like this: