TDSM 9.11

From The Data Science Design Manual Wikia
Jump to: navigation, search

The outlier will deviate the slope of the best fit line. As the best fit line will try to fit all the given points. And in such a situation, the answer may not be accurate for most of the values, just because of an outlier. By repeatedly deleting the outlier (largest residual), we will improve the estimation of the best fit line. That is the slope will be more stable. However, we need to take care, that we should not delete many points that the model becomes very accurate. This can be an overfitting model. Hence, only a few extreme / outlier points should be deleted.