# How To implement Linear Regression Algorithm from scratch in Python(Using Numpy only)

Linear Regression Algorithm is one of the simplest and easy Regression algorithms in Machine Learning. It is easy to code and implements. Python supports libraries like scikit learn which allows implementing Linear regression in a few lines of code. However, if we want to implement the algorithm from scratch we need to be a bit smart. In this article, we will see how to implement Linear regression from scratch in python using numpy only. You can directly download the code from here.

!!! Strongly Recomended- Introduction to Linear Regression Algorithm

The full code is as follows:

import numpy as npimport pandas as pddataset = pd.read_csv('E:/tutorials/linreg_data.csv')#print(dataset)X_train = dataset.iloc[:,:-1].valuesY_train = dataset.iloc[:,1].valuesprint(X_train)print(np.shape(Y_train))Y_train = Y_train.reshape(-1,1)print(np.shape(Y_train))def Loss_Function(target,Y_pred):    return np.mean(pow((Y_pred-target),2))def pred(X_test):    return np.dot(X_test,w)+b# initializing weights and baisw=.5b=.5for i in range(1000000):    Y_pred = np.dot(X_train,w)+b    loss = Loss_Function(Y_train,Y_pred)    if(i%100==0):        print("iteration",i,"loss---------------->>>>>", loss)    grad_weight =np.dot((Y_pred-Y_train).T,X_train)/X_train.shape    grad_bais = np.mean(Y_pred-Y_train)    w = w - .0001*grad_weight    b = b - .0001*grad_baisY_out = pred(1)print(Y_out)

Before going through the code have a look at our data set The data points are generated such that they fits the equation, y=5x+9

Now Lets understand the code by breaking it into different parts.

Part 1: First of All we import all our libraries and dataset required to make model. we have used Pandas Library for importing Data only.

import numpy as npimport pandas as pddataset = pd.read_csv('E:/tutorials/linreg_data.csv') #use location of your dataset.X_train = dataset.iloc[:,:-1].values #slice dataset upto last column(without including last column) as feature vector and store in X_train matrix.Y_train = dataset.iloc[:,1].values #store the last column in Y and it will be the class label,NOTE: I have used 1 because there are only 2 columns in my dataset(0,1). Be carefull in your case.Y_train = Y_train.reshape(-1,1) # This will reshape Y_train as a column vector. conversally reshape(1,-1) will reshape an array as row vector. You can check what is the difference in using and not using reshape by your own.

Part 2: Defining A loss function: This part of code is the definition of loss function. we have used Mean squared loss function.

$​\displaystyle loss=\frac{1}{n}\sum\limits_{{i=1}}^{n}{{{{{\left( {Y\_pre{{d}_{i}}-{{Y}_{i}}} \right)}}^{2}}}}$

where n= total number of data.

def Loss_Function(target,Y_pred):    return np.mean(pow((Y_pred-target),2))

Part 3: This is the training or learning part. Training means to find the optimium value of all the parameters that gives least loss(it is done by minimizing the loss). In our case, the parameters are weignt matrix ‘W’ and bias ‘b’. For optimization, we are using gradient descent method. To read about gradient descent and its variants, you can visit here. However, I will also discuss about all the necessary concepts about gradient descent which will be more than sufficient.

Let’s understand it…
The main goal behind using gradient descent is to minimize the cost( or loss) function using some basic concept of calculus. Following are the steps involved in optimization using gradient descent algorithm.

• Calculate Y_pred from eqn.1 using randomly initilized values of b and w.
• Calculate loss using MSE loss function.
• Find the derivative (gradient) of Loss with respect to b and w.

$$​\displaystyle \begin{array}{l}we\text{ }have,\\(Y\_pred=w*X+b—-eqn.1\\loss=\frac{1}{n}\sum\limits_{{i=1}}^{n}{{{{{\left( {{{Y}_{i}}-Y\_pre{{d}_{i}}} \right)}}^{2}}}}\\Therefore,\\\frac{{\partial loss}}{{\partial w}}=\frac{\partial }{{\partial w}}\left( {\frac{1}{n}\sum\limits_{{i=1}}^{n}{{{{{\left( {{{Y}_{i}}-Y\_pre{{d}_{i}}} \right)}}^{2}}}}} \right)\\=\frac{\partial }{{\partial w}}\left( {\frac{1}{n}\sum\limits_{{i=1}}^{n}{{{{{\left( {{{Y}_{i}}-(w*{{X}_{i}}+b)} \right)}}^{2}}}}} \right)\\=\frac{2}{n}\sum\limits_{{i=1}}^{n}{{\left[ {\left( {{{Y}_{i}}-(w*{{X}_{i}}+b)} \right)*{{X}_{i}}} \right]}}\\=\frac{2}{n}\sum\limits_{{i=1}}^{n}{{\left[ {\left( {{{Y}_{i}}-Y\_pre{{d}_{i}}} \right)*{{X}_{i}}} \right]}}\\\\and,\\\frac{\begin{array}{l}\\\partial loss\end{array}}{{\partial b}}=\frac{\partial }{{\partial b}}\left( {\frac{1}{n}\sum\limits_{{i=1}}^{n}{{{{{\left( {{{Y}_{i}}-Y\_pre{{d}_{i}}} \right)}}^{2}}}}} \right)\\=\frac{2}{n}\sum\limits_{{i=1}}^{n}{{\left[ {\left( {{{Y}_{i}}-Y\_pre{{d}_{i}}} \right)} \right]}}\\\\\end{array}$$

• Update the previous b and w as follows,

$$\displaystyle \begin{array}{l}\text{W = W – }\eta \left( {\frac{{\partial loss}}{{\partial W}}} \right)\\\\b\text{ = }b\text{ – }\eta \frac{{\partial loss}}{{\partial b}},\text{ where }\eta \text{ is the learning rate}\text{.}\end{array}$$

• Iterate above 4 steps unless the loss is significantly reduced.

In Following section of code,

$\displaystyle \begin{array}{l}grad\_weight=\frac{{\partial loss}}{{\partial w}}\\grad\_bais=\frac{{\partial loss}}{{\partial b}}\\\end{array}$

# initializing weights and baisw=.5b=.5for i in range(1000000): Y_pred = np.dot(X_train,w)+b loss = Loss_Function(Y_train,Y_pred) if(i%100==0): print("iteration",i,"loss---------------->>>>>", loss) grad_weight =np.dot((Y_pred-Y_train).T,X_train)/X_train.shape grad_bais = np.mean(Y_pred-Y_train) w = w - .0001*grad_weight b = b - .0001*grad_bais

Part 4: In this part we have defined a function to predict an output for test data.

def pred(X_test):    return np.dot(X_test,w)+b

Part 5: Testing Part: Lets test our model by giving a test data, x=1. and print the output. since we have trained our model on data generated for the equation y=5x+9, the output for x=1 must be equal or nearly equal to 14.

Y_out = pred(1)print(Y_out)

OUTPUT:

Displaying last some iterations and the output,

iteration 998800 loss—————->>>>> 9.608868352788592
iteration 998900 loss—————->>>>> 9.608868352788587
iteration 999000 loss—————->>>>> 9.608868352788582
iteration 999100 loss—————->>>>> 9.608868352788578
iteration 999200 loss—————->>>>> 9.608868352788587
iteration 999300 loss—————->>>>> 9.608868352788583
iteration 999400 loss—————->>>>> 9.608868352788578
iteration 999500 loss—————->>>>> 9.608868352788573
iteration 999600 loss—————->>>>> 9.60886835278859
iteration 999700 loss—————->>>>> 9.608868352788601
iteration 999800 loss—————->>>>> 9.60886835278859
iteration 999900 loss—————->>>>> 9.608868352788594

[[14.16796364]]  which is nearly equal to 14.

The error can further be decreased by increasing the number of iterations.

Insert math as
$${}$$