# Basics of Artificial neural network

# Artificial neural network

**neural network**is a network of neurons or, in a contemporary context, an artificial neural network made up of artificial neurons or nodes. An

**artificial neural network**is influenced from a biological neural network. As a biological neural network is made up of true biological neurons, in the same manner an

**artificial neural network**is made from artificial neurons called “

**Perceptrons**“.

**Artificial neural network**is developed for solving artificial intelligence (AI) problems. An artificial neuron links are termed as weights. All inputs are weight-modified and summed up. This activity is called a linear combination. Finally, the output is controlled by an activation function applied over that linear combination.

**Forward Propagation**“.

**error or loss**”. We have to minimize that error. But the question arises, how do you reduce that error?

**Backward Propagation**“.

**Gradient Descent**”, Which helps to optimize the task quickly and efficiently.

## Multi-Layer Perceptron and its basics

Perceptrons are defined as the basic unit (You can also call it building blocks) of an artificial neural network. It can be understood as anything (let’s say it as a machine) that takes multiple inputs and produces one output. The image below shows the typical structure of a perceptron.

## What is an activation function?

Activation Function is a function takes the sum of weighted input (w1*x1 + w2*x2 + w3*x3 + 1*b) as an input and return the output of the neuron.

activation function |

Here, **a** is the output obtained from the activation function *f*. Clearly the argument to the activation function is the sum of product of weight and inputs.

The activation function is used to make a non-linear transformation which allows us to fit non-linear hypotheses or to estimate the complex functions. There are several activation functions, like: “Sigmoid”, “Tanh”, “ReLu”,” Softmax” and many others. You can use any of them. Sigmoid, ReLu and Softmax activation functions are more commonly used than others.

## Forward Propagation, Back Propagation and Epochs

We have calculated the output so far and this process is called “Forward Propagation.” But what if the projected output is far away (high error) from the real output. What we are doing in the neural network is updating the biases and weights to minimize that error. This method of updating weight and bias is called “Back Propagation.”

The Back-propagation algorithms work by determining the error (or loss) at the output and then propagating it back into the network. The weights and bias are updated to minimize the error obtained from each neuron. The initial step in minimizing the loss is to determine the gradient (Derivatives) of error w.r.t. the weights and biases at each layer.

## Multi-layer perceptron (MLP)

So far, we have seen just a single layer consisting of 3 input nodes i.e. x1, x2 and x3 and an output layer consisting of a single neuron. But, in the practical applications, the single-layer neural network may not be sufficient to meet out network. An MLP contains additional layers in between the input layer and output called as hidden layer as shown below. You can use as many hidden layer as you wish, but introducing 2/3 hidden layer is sufficient in most of the situation. In addition, the use of higher number of hidden layers is computationally expensive. A simple diagrammatic expression of a MLP is as shown below.

## Full Batch Gradient Descent and Stochastic Gradient Descent

Full Batch Gradient Descent and Stochastic Gradient Descent algorithms are the variants of Gradient Descent. Both of them perform the same work i.e. updating the weights and bias of the MLP by using the same updating algorithm but the difference is in the number of training samples that deal with in an iteration used to update the weights and biases.

Let us define:

Thus the hidden layer inputs are obtained as:

Finally we get the input for output layer and the final output as follows:

^{2})/2.

I have written this directly. The derivation is simple. (

**Hint**– Take natural log on both side and after that using simple chain rule will give you the result)

**learning_rate**: The amount that weights are updated is controlled by a configuration parameter called the learning rate. The value of learning rate should be chosen wisely. If we take learning rate very small then the learning process is very slow (however the accuracy level will be high) and if we take learning rate very large then there may be problem of overshooting and we may not get minimum value of error.

Pingback:Most commonly used activation functions in Deep Learning | KRAJ Education

what important thing to learn network ?