TDSM 7.9

From The Data Science Design Manual Wikia
Revision as of 17:50, 26 November 2017 by Jgarg (talk | contribs) (Created page with "When a model learns the noise instead of signal, it is said to be overfit. A method to check whether a model is overfit is that when our model works very well on training dat...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

When a model learns the noise instead of signal, it is said to be overfit. A method to check whether a model is overfit is that when our model works very well on training data, and bad on testing data, it is usually overfit. So overfit model is sensitive to small fluctuations in training dataset. To prevent overfit we can do these things:

  1. Cross Validation : In this, data is divided into test, training and validation data. A validation set is held out and is never shown to model while training. It is then used to test our model.
  2. Removal of some features(PCA) : When the model is trained only on important features, it is unlikely to learn noise.