What is Data Preprocessing?
Data preprocessing is a step in the data mining and data analysis process that takes raw data and transforms it into a format that can be understood and analyzed by computers and machine learning.
When we talk about data, we usually think of some large datasets with some rows and columns. While that is a likely scenario, it is not always the case — data could be in so many different forms: Structured Tables, Images, Audio files, Videos, etc.
Machines don’t understand text, image, or video data as it is, they understand 1s and 0s. Real-world data also contains noises, missing values, etc. which cannot be directly used for ML models.
Hence, data preprocessing is required for cleaning the data and making it suitable for an ML model which increases the accuracy and efficiency of the model.
It involves the following steps:
- Getting the Dataset
- Importing Libraries
- Importing Dataset
- Data Quality Assessment:
 i) Finding and Processing Missing/Inconsistent/Duplicate Data
 ii) Mixed Data Values/Mismatched Data Types
 iii) Data Outliers
- Data Transformation:
 i) Data Aggregation
 ii) Data Normalization
 iii) Feature Selection/Sampling
- Feature Encoding:
 i) Label Encoding (Ordinal Data)
 ii) One-Hot Encoding (Nominal Data)
- Dimensionality Reduction: PCA/SVD
- Splitting Dataset into Training, Validation & Test set
- Feature Scaling:
 i) Standardization
 ii) Normalization
I hope you find this helpful.
THANKS FOR READING :)

 
 
 
Comments
Post a Comment