Posts

Scenario based interview questions for Data science!

Image
  Q1. You are given a train data set having 1000 columns and 1 million rows. The data set is based on a classification problem. Your manager has asked you to reduce the dimension of this data so that model computation time can be reduced. Your machine has memory constraints. What would you do? (You are free to make practical assumptions.) Answer: Processing a high dimensional data on a limited memory machine is a strenuous task, your interviewer would be fully aware of that. Following are the methods you can use to tackle such situation: Since we have lower RAM, we should close all other applications in our machine, including the web browser, so that most of the memory can be put to use. We can randomly sample the data set. This means, we can create a smaller data set, let’s say, having 1000 variables and 300000 rows and do the computations. To reduce dimensionality, we can separate the numerical and categorical variables and remove the correlated variables. For numerical variables, w

What is Data Preprocessing?

Image
  Data preprocessing  is a step in the data mining and data analysis process that takes raw data and transforms it into a format that can be understood and analyzed by computers and machine learning. When we talk about  data , we usually think of some  large datasets  with some rows and columns. While that is a likely scenario, it is not always the case — data could be in so many different forms:  Structured Tables, Images, Audio files, Videos , etc. Machines don’t understand text, image, or video data as it is, they understand 1s and 0s. Real-world data also contains noises, missing values, etc. which cannot be directly used for ML models. Hence,  data preprocessing  is required for cleaning the data and making it suitable for an ML model which increases the accuracy and efficiency of the model. It involves the following steps: Getting the Dataset Importing Libraries Importing Dataset Data Quality Assessment: i) Finding and Processing Missing/Inconsistent/Duplicate Data ii) Mixed Data

Commonly used Machine Learning Algorithms!

Image
  List of Common Machine Learning Algorithms Here is the list of commonly used machine learning algorithms. These algorithms can be applied to almost any data problem: Linear Regression Logistic Regression Decision Tree SVM Naive Bayes kNN K-Means Random Forest Dimensionality Reduction Algorithms Gradient Boosting algorithms GBM XGBoost LightGBM CatBoost 1. Linear Regression It is used to estimate real values (cost of houses, number of calls, total sales etc.) based on continuous variable(s). Here, we establish relationship between independent and dependent variables by fitting a best line. This best fit line is known as regression line and represented by a linear equation Y= a *X + b. https://www.analyticsvidhya.com/wp-content/uploads/2015/08/Linear_Regression.png 2. Logistic Regression Don’t get confused by its name! It is a classification not a regression algorithm. It is used to estimate discrete values ( Binary values like 0/1, yes/no, true/false ) based on given set of independen

Top 10 Data Science Projects!

Image
  1 . Sentiment Analyzer of Social Media This is one of the interesting and innovative machine learning projects. As, social media like Facebook, Twitter, and YouTube is the ocean of big data. Therefore, mining these data can be beneficial in a number of ways to understand user sentiments and opinions. This project can be effective for digital marketing and branding to understand the opinion or reaction for a product or service of a customer. 2 . Music Recommendation System Are you a lover of music? Always love to listen to your favorite one? Then, you will be glad to know about this interesting machine learning project idea. This can also be an innovative project. The goal of this project is to recommend music based on user listening history. 3 . Credit Card Fraud Detection Project Companies that involve a lot of transactions with the use of cards need to find anomalies in the system. The project aims to build a fraud detection model on credit cards. We will use the transaction and th

10 MOST IMPORTANT tips for Data Science Interviews!

Image
Here are the 10 MOST IMPORTANT tips for making you a Compelling value proposition & helps create an impact. 1. From the PORTFOLIO OF PROJECTS which is actually your LIVING RESUME, choose ONE project that you are SURE about to talk & your story SHOULD cover the following at a minimum with Demonstrable evidence

Interesting Interview Story!

Interviewer: You have only 10 minutes to impress me. Me: Okay sir. Can I do anything in this time to impress you? Interviewer: Yes, but don’t cross your limits. Me: Sure, sir. Let’s

Advice to Newbies in Data Science.

Image
  1. Pick the brain of an expert. There are myriads of ways to learn data science. You can read articles, watch videos, enroll in onli n e courses, turn up at meetups, etc. But one thing that you cannot “learn” is the  experience . That you have to gain throughout years of working in the field. There is much to learn from Data science experts, their experience in managing end-to-end machine learning and deep learning projects, their philosophy when constructing a data science team from scratch,