TDSM 3.17

From The Data Science Design Manual Wikia
Revision as of 21:07, 23 November 2017 by Deegupta (talk | contribs) (Created page with "'''General Steps for treating Missing Data''': * Identify the patterns/reasons for missing values correctly. * Understand distribution of missing data, do they follow certain...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

General Steps for treating Missing Data:

  • Identify the patterns/reasons for missing values correctly.
  • Understand distribution of missing data, do they follow certain distribution?
  • Decide on the best method of analysis and treat the values.

Here are some techniques to treat missing data:

Deletion: If the nature of missing values is completely random and with enough data we can simply delete the data points with missing values.

Imputation:

1. Popular Averaging Techniques: Mean, median and mode are the most popular averaging techniques, which can be used to infer missing values and can be used to replace them.

2. Predictive Techniques: imputation of missing values from predictive techniques assumes that the nature of such missing observations are not observed completely at random and the variables were chosen to impute such missing observations have some relationship with it, else it could yield imprecise estimates. Many regression techniques can be used for this.