Introduction to Linear Regression Algorithm

Linear regression one of the simple but very strong regression algorithm in Supervised Machine Learning. Regression is a process of modeling the relationship between an independent variable (say X) and a dependent variable (say Y), and Linear Regression is linear approach of achieving that. The Linear Regression may be simple or multiple depending upon the number of independent variables. In this article we will be dealing about Simple Linear Regression.

Linear regressionSuppose a linearly distributed data as shown in the figure above in which the independent variable value is represented by ‘X’ and dependent variable value is represented by ‘Y’. The Linear Regression performs the task to establish a relationship between the dependent and independent variables so that the dependent variable value (Y) can be predicted based on a given independent variable (X).

In the figure above, X (input) is the working hours and Y (output) is the wage of a person. The main aim of Linear Regression algorithm is to find a straight line that best fit our data. That best fit line is also called Regression Line.

Now let’s understand the algorithm.

We have hypothesis function for Linear Regression:

\( \displaystyle Y\_pred=w*X+b………………………………..eqn.1 \)

While training the model we are given:

X: input training data (univariate – one input variable (parameter))
Y: dependent variable values corresponding to X.

Training a Linear regression model means to find the best values for b and w that gives the best fit line on the basic the training data. (Training data is the data set that is supplied to the model or algorithm for learning)


b: bias (intercept)
w: weight (coefficient of X)

Once the best values are learned we can predict the output for any input provided to the model.

 But here arises a question. How to find values b and w to get the best fit line? Or how the learning process works?

 Let’s see.

gradient descent

  • First of all initialize b and w with some random values.


  • Then we calculate Y_pred from the hypothesis function (eqn. 1) by using the initialized values of b and w & X


  • Now the prediction loss is calculated using loss (cost) function. The cost function mostly used with Linear Regression is mean squared error (MSE).

\( \displaystyle loss=\frac{1}{n}\sum\limits_{{i=1}}^{n}{{{{{\left( {Y\_pre{{d}_{i}}-{{Y}_{i}}} \right)}}^{2}}}} \)

where n= total number of data.

  • Then the loss is minimized by some optimization algorithms. The most common and widely used optimization algorithm is Gradient descent algorithm.


  • Finally, when the loss is sufficiently minimized we get the approximately best values of b and w.

 Gradient Descent Algorithm:

  • Calculate Y_pred from eqn.1 using randomly initilized values of b and w.
  • Calculate loss using MSE loss function.
  • Find the derivative (gradient) of Loss with respect to b and w.

\( \displaystyle \begin{array}{l}we\text{ }have,\\(Y\_pred=w*X+b\\loss=\frac{1}{n}\sum\limits_{{i=1}}^{n}{{{{{\left( {{{Y}_{i}}-Y\_pre{{d}_{i}}} \right)}}^{2}}}}\\Therefore,\\\frac{{\partial loss}}{{\partial w}}=\frac{\partial }{{\partial w}}\left( {\frac{1}{n}\sum\limits_{{i=1}}^{n}{{{{{\left( {{{Y}_{i}}-Y\_pre{{d}_{i}}} \right)}}^{2}}}}} \right)\\=\frac{\partial }{{\partial w}}\left( {\frac{1}{n}\sum\limits_{{i=1}}^{n}{{{{{\left( {{{Y}_{i}}-(w*{{X}_{i}}+b)} \right)}}^{2}}}}} \right)\\=\frac{2}{n}\sum\limits_{{i=1}}^{n}{{\left[ {\left( {{{Y}_{i}}-(w*{{X}_{i}}+b)} \right)*{{X}_{i}}} \right]}}\\=\frac{2}{n}\sum\limits_{{i=1}}^{n}{{\left[ {\left( {{{Y}_{i}}-Y\_pre{{d}_{i}}} \right)*{{X}_{i}}} \right]}}\\\\and,\\\frac{\begin{array}{l}\\\partial loss\end{array}}{{\partial b}}=\frac{\partial }{{\partial b}}\left( {\frac{1}{n}\sum\limits_{{i=1}}^{n}{{{{{\left( {{{Y}_{i}}-Y\_pre{{d}_{i}}} \right)}}^{2}}}}} \right)\\=\frac{2}{n}\sum\limits_{{i=1}}^{n}{{\left[ {\left( {{{Y}_{i}}-Y\_pre{{d}_{i}}} \right)} \right]}}\\\\\end{array} \)

  • Update the previous b and w as follows,

\[ \displaystyle \begin{array}{l}\text{W = W – }\eta \left( {\frac{{\partial loss}}{{\partial W}}} \right)\\\\b\text{ = }b\text{ – }\eta \frac{{\partial loss}}{{\partial b}},\text{ where }\eta \text{ is the learning rate}\text{.}\end{array} \]

  • Iterate above 4 steps unless the loss is significantly reduced.

To read more on Gradient descent algorithm visit here.


3 thoughts on “Introduction to Linear Regression Algorithm

Leave a Reply

Insert math as
Additional settings
Formula color
Text color
Type math using LaTeX
Nothing to preview
%d bloggers like this: