TDSM 7.9

From The Data Science Design Manual Wikia
Jump to: navigation, search

When a model learns the noise instead of signal, it is said to be overfit. A method to check whether a model is overfit is that when our model works very well on training data, and bad on testing data, it is usually overfit. So overfit model is sensitive to small fluctuations in training dataset. To prevent overfit we can do these things:

  1. Cross Validation : In this, data is divided into test, training and validation data. A validation set is held out and is never shown to model while training. It is then used to test our model.
  2. Removal of some features(PCA) : When the model is trained only on important features, it is unlikely to learn noise.
  3. Reduce the complexity of your model. For example, for linear regression, add the penalty term, for decision tree, design some pruning algorithms, for neural networks, use fewer layers, smaller network or add dropout.
  4. Refresh your hyperparameters, like learning rate.