In computing, Data transformation is the method of transforming data from one format or structure to another format or structure. It is a fundamental aspect of most data integration and data management tasks such as data wrangling, data storage, data integration. In this process, we try to change the nature of data using some strategies, so that we can extract important information from it.
Some of the techniques used for data transformation are:
i. Aggregation: In this technique the summation or aggregation operation is applied over the data. E.g. the daily sales data may be aggregated so as to compute monthly and annual amount.
ii. Discretization: In this technique, we construct and replace raw values of a numeric attribute (e.g. age data) by interval values (e.g. 0-10, 10-20, 20-40) or by conceptual values (e.g. child, young, adult).
iii. Attribute construction/ Feature engineering: First let’s understand what feature engineering is? Actually feature engineering is process of constructing/engineering new attribute/feature by observing the available features and relation between them. This technique is helpful in generating extra information from vague data. This technique can be helpful if we have fewer features but still they contain hidden information to extract.
iv. Normalization/Standardization: What is Normalization or Standardization? Normalization or standardization is defined as the process of rescaling original data without changing its behavior or nature. We define new boundary (mostly 0,1) and convert data accordingly. This technique is useful in classification algorithms involving neural network or distance based algorithm (e.g. KNN, K-means). To read more about data normalization visit here.