Implementation Of KNN(using Scikit learn,numpy and pandas)
|Implementation Of KNN(using Scikit learn)|
KNN classifier is one of the strongest but easily implementable supervised machine learning algorithm. It can be used for both classification and regression problems. If we try to implement KNN from scratch it becomes a bit tricky however, there are some libraries like sklearn in python, that allows a programmer to make KNN model easily without using deep ideas of mathematics.
Step-1: First of all we load/import our training data set either from computer hard disk or from any url.
dataset = pd.read_csv(“E:/input/iris.csv”)
print(dataset.head()) # prints first five tuples of your data.
Step-2: Now, we split data row wise into attribute/features and their corresponding labels.
X = dataset.iloc[:, :-1].values # splits the data and make separate array X to hold attributes.
y = dataset.iloc[:, 4].values # splits the data and make separate array y to hold corresponding labels.
Step-3: In this step, we divide our entire dataset into two subset. one of them is used for training our model and the remaining one for testing the model. we divide our data into 80:20 i.e. first 80% of total data is training data and remaining 20% is our test data. We divide both attributes and labels. We do this type of division to measure the accuracy of our model. This process of spiting our supplied dataset into training and testing subsets in order to know the accuracy and performance of our model is called cross-validation.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)
Step-4: In this step, we perform normalization/standardization. It is process of re-scaling our data, so that the variations present in our data will not affect the accuracy of model. we have used z-score normalization technique here. For more on normalization, click here.
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
Step-5: Now its time to define our KNN model.We make a model,and supply attributes of test subset for the prediction.
from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors=9) #defining KNN classifier for k=9.
classifier.fit(X_train, y_train) #learning process i.e. supplying training data to model
y_pred = classifier.predict(X_test) #stores prediction result in y_pred
Step-6: Since the test data we’ve supplied to the mdel is a portion of training data, so we have the actual labels for them. In this step we find the magnitudes of some classification metrices like precision, recall, f1-score etc.
from sklearn.metrics import classification_report, confusion_matrix
Step-7: supply actual test data to the model.
# testing model by suppplying ramdom data
x_random = [[-1.56697667 , 1.22358774, –1.56980273, –1.33046652],
[-2.21742620 , 3.08669365, –1.29593102,–1.07025858]]
sepal.length sepal.width petal.length petal.width variety
0 5.1 3.5 1.4 0.2 Setosa
1 4.9 3.0 1.4 0.2 Setosa
2 4.7 3.2 1.3 0.2 Setosa
3 4.6 3.1 1.5 0.2 Setosa
4 5.0 3.6 1.4 0.2 Setosa
[[11 0 0]
[ 0 9 0]
[ 0 0 10]]
Classification metrices for test data:
precision recall f1-score support
Setosa 1.00 1.00 1.00 11
Versicolor 1.00 1.00 1.00 9
Virginica 1.00 1.00 1.00 10
micro avg 1.00 1.00 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30
For actual test data:
For Implementation of KNN from scratch using python, click here.