What is Data Preprocessing?

- October 23, 2024

Data preprocessing is a step in the data mining and data analysis process that takes raw data and transforms it into a format that can be understood and analyzed by computers and machine learning.

When we talk about data, we usually think of some large datasets with some rows and columns. While that is a likely scenario, it is not always the case — data could be in so many different forms: Structured Tables, Images, Audio files, Videos, etc.

Machines don’t understand text, image, or video data as it is, they understand 1s and 0s. Real-world data also contains noises, missing values, etc. which cannot be directly used for ML models.

Hence, data preprocessing is required for cleaning the data and making it suitable for an ML model which increases the accuracy and efficiency of the model.

It involves the following steps:

Getting the Dataset
Importing Libraries
Importing Dataset
Data Quality Assessment:
i) Finding and Processing Missing/Inconsistent/Duplicate Data
ii) Mixed Data Values/Mismatched Data Types
iii) Data Outliers
Data Transformation:
i) Data Aggregation
ii) Data Normalization
iii) Feature Selection/Sampling
Feature Encoding:
i) Label Encoding (Ordinal Data)
ii) One-Hot Encoding (Nominal Data)
Dimensionality Reduction: PCA/SVD
Splitting Dataset into Training, Validation & Test set
Feature Scaling:
i) Standardization
ii) Normalization

I hope you find this helpful.

THANKS FOR READING :)

Search This Blog

Data Science Tips for Beginners!

What is Data Preprocessing?

Note: Don't forget to Bookmark the Blog as new Tips and Resources are getting updated here DAILY.

❤ Like, ✍️ comment and 👥share to support!!

Comments

Post a Comment

Popular posts from this blog

10 MOST IMPORTANT tips for Data Science Interviews!

Scenario based interview questions for Data science!