Before becoming a Data Scientist, learn from these mistakes!!

'Data Science' is a very attractive career path today. A huge lot of individuals start walking on this path and majority of them fall in the trap of these luring mistakes. I also belong to this majority. That is why I am writing this post so that you learn from my mistakes as it is rightly said “Intelligent is the one who learns from others’ mistakes.”
  1. Learning theory first and implementing later : Like many others we start our journey by taking a course and learn ‘Data Science’. Mine was no different. Majority of these courses will give you lots and lots of theory. Many courses have quizzes and assignments, but still the practical aspects lack in them because you don’t practically implement things along with the course. Theories are definitely important but they are of no use if you don’t implement and understand its actual use.
  2. Relying just on the courses : We always get trapped in the shell to do many courses for gaining knowledge. Major courses mainly show you the perfect path. But in actual world you never get such a path to cover. When I started to work on real problems it was heart-breaking for me. The data was pretty complex, handling the data was cumbersome, no statistical analysis gave me insights and many more. For gaining good knowledge about real world problems you need to break this shell and explore blogs, hackathons, questionnaires etc. There are platforms/communities like Analytics Vidhya, Kaggle, Reddit, KDnuggets, R Bloggers and many more where you can join and get the required experience.
  3. Not exploring the data : When you have started the journey and start working on it, do give ample amount of time to explore the data. Use all the statistical knowledge gained in courses and try to get meaningful insights on the data, as you guys already know that Data is the most important thing to get good results. Being an engineer my statistical background is very poor. Hence at first I simply avoided to explore the data without any statistics. But sooner I realized that for applying certain algorithms some statistical conditions should be satisfied. This made me go back to the previous steps and obviously caused loss of time. That is why I suggest you guys to try exploring all the features present in correct manner. This will help give better understanding of the data and obviously save your time in the longer run.
  4. Working without having domain knowledge : You are no God that you will have knowledge about everything in the world. Hence try to understand the domain first before doing anything with the data. You will understand the data better and be able to derive important features only if you understand the domain. For my first project I was simply given the data and told to do sales predictions. I had no knowledge of what external factors could affect those sales which I should collect other than the internal data provided to me. At first my predictions were not so good. But with time I gained the domain knowledge and started adding such external features to my data. I found the accuracy to rise from 48.8% to 87.4%.
  5. Running algorithms without understanding them : Who ever I referred to regarding my project gave me big-big names of different types of algorithms. I as a perfect disciple tried to implement all of them. But I did not have much time to understand the logic of all algorithms. So I simply copy-pasted codes from Google. This as expected did not turn out to be good. I did not know what to do if an algorithm overfitted- which parameters to adjust etc. With time I realized the importance of each parameter in any algorithm, which made me regret those moments when I just implemented whatever I got.
  6. Looking for 100% accuracy : Before Machine Learning we have always worked on problems where we get 100% correct result. Expecting the same thing here can be a very big mistake and frustrating definitely. In Machine Learning getting 90% is also considered more than expected. You should understand this and be ready to make the users understand the same. You will understand this soon but making your clients understand this is another tough mountain to climb.
  7. Running towards results : We all know the steps of working on any problem are – Business Understanding, Data Mining, Data Cleaning, Data Exploration, Feature Engineering, Predictive Modeling and Model Deployment. We definitely try to run with our best speed to Predictive Modeling phase and get the results to showcase. In this we always forget the basic thing learnt in our courses - to spend 60-70% of our time on the steps before Predictive Modeling. Believe me this thing is really true. If you want to be efficient then do follow it.

Comments

Popular posts from this blog

Skillset Every Data Scientist Should Have!

Advice to Newbies in Data Science.