What is Data Preprocessing?

 


Data preprocessing is a step in the data mining and data analysis process that takes raw data and transforms it into a format that can be understood and analyzed by computers and machine learning.

When we talk about data, we usually think of some large datasets with some rows and columns. While that is a likely scenario, it is not always the case — data could be in so many different forms: Structured Tables, Images, Audio files, Videos, etc.

Machines don’t understand text, image, or video data as it is, they understand 1s and 0s. Real-world data also contains noises, missing values, etc. which cannot be directly used for ML models.

Hence, data preprocessing is required for cleaning the data and making it suitable for an ML model which increases the accuracy and efficiency of the model.

It involves the following steps:

  1. Getting the Dataset
  2. Importing Libraries
  3. Importing Dataset
  4. Data Quality Assessment:
    i) Finding and Processing Missing/Inconsistent/Duplicate Data
    ii) Mixed Data Values/Mismatched Data Types
    iii) Data Outliers
  5. Data Transformation:
    i) Data Aggregation
    ii) Data Normalization
    iii) Feature Selection/Sampling
  6. Feature Encoding:
    i) Label Encoding (Ordinal Data)
    ii) One-Hot Encoding (Nominal Data)
  7. Dimensionality Reduction: PCA/SVD
  8. Splitting Dataset into Training, Validation & Test set
  9. Feature Scaling:
    i) Standardization
    ii) Normalization

I hope you find this helpful.

THANKS FOR READING :)


Note: Don't forget to Bookmark the Blog as new Tips and Resources are getting updated here DAILY.


❤ Like, ✍️ comment and 👥share to support!! 

Comments

Popular posts from this blog

Scenario based interview questions for Data science!