Z Score Normalization(Standard score formula)

Normalization or standardization is defined as the process of rescaling original data without changing its original behavior or nature. It is a technique often applied as part of data pre-processing in Machine Learning. The main aim of normalization is to change the value of data in dataset to a common scale, without distirting the differences in the ranges of value.We often define new boundary (most common is (0,1),(-1,1)) and convert data accordingly. This technique is useful in classification algorithms involving neural network or distance based algorithm (e.g. KNN, K-means). 

In Z score normalization, the values are normalized based on the mean and standard deviation of attribute A. For V value of attribute A, normalized value Ui is given as,

Z-score Normalization(Zero mean normalization)
Z-score Normalization(Zero mean normalization)

where Avg(A) and Std(A) represents the average and standard deviation respectively for the values of attribute A.

Let’s see an example: Consider that the mean and standard deviation of values for attribute income $54,000 and $16,000 respectively. With z-score normalization, a value of $73,000 for income is normalized to (73,000-54,000)/16,000=1.225.

In Python:
from sklearn.preprocessing import StandardScaler


print("Before standardisation X values are ", X)
sc_X = StandardScaler()
X = sc_X.fit_transform(X)

print("After standardisation X values are ", X)
Before standardization X values are  
[[101, 105, 222, 333, 225, 334, 556],
 [105, 105, 258, 354, 221, 334, 556]]
After standardization X values are
[-1.  0. -1. -1.  1.  0.  0.]
 [ 1.  0.  1.  1. -1.  0.  0.]]
To read more on normalization visit here.

Leave a Reply

Insert math as
Additional settings
Formula color
Text color
Type math using LaTeX
Nothing to preview
%d bloggers like this: