Let's understand what are Outliers!

An outlier is an observation that is unlike the other observations.
It is rare, or distinct, or does not fit in some way.
Outliers can have many causes, such as:
  • Measurement or input error.
  • Data corruption.
  • True outlier observation (e.g. Just like 10 in the image).
There is no precise way to define and identify outliers in general because of the specifics of each dataset. Instead, you, or a domain expert, must interpret the raw observations and decide whether a value is an outlier or not.
Nevertheless, we can use statistical methods to identify observations that appear to be rare or unlikely given the available data.
This does not mean that the values identified are outliers and should be removed. A good tip is to consider plotting the identified outlier values, perhaps in the context of non-outlier values to see if there are any systematic relationships or patterns to the outliers. If there is, perhaps they are not outliers and can be explained, or perhaps the outliers themselves can be identified more systematically. One of the best ways to deal with outliers is by using Standard Deviation Method or Inter Quartile Method.

Comments

Popular posts from this blog

Skillset Every Data Scientist Should Have!

Advice to Newbies in Data Science.