Difference between revisions of "TDSM 3.17"
From The Data Science Design Manual Wikia
(Created page with "'''General Steps for treating Missing Data''': * Identify the patterns/reasons for missing values correctly. * Understand distribution of missing data, do they follow certain...") |
m |
||
Line 1: | Line 1: | ||
− | '''General Steps for treating Missing Data''' | + | '''General Steps for treating Missing Data''' |
* Identify the patterns/reasons for missing values correctly. | * Identify the patterns/reasons for missing values correctly. | ||
* Understand distribution of missing data, do they follow certain distribution? | * Understand distribution of missing data, do they follow certain distribution? | ||
* Decide on the best method of analysis and treat the values. | * Decide on the best method of analysis and treat the values. | ||
− | Here are some techniques to treat missing data | + | Here are some techniques to treat missing data |
− | '''Deletion''': If the nature of missing values is completely random and with enough data we can simply delete the data points with missing values. | + | '''Deletion''': |
+ | * If the nature of missing values is completely random and with enough data we can simply delete the data points with missing values. | ||
− | '''Imputation''' | + | '''Imputation''' |
− | + | * Popular Averaging Techniques: Mean, median and mode are the most popular averaging techniques, which can be used to infer missing values and can be used to replace them. | |
− | + | * Predictive Techniques: imputation of missing values from predictive techniques assumes that the nature of such missing observations are not observed completely at random and the variables were chosen to impute such missing observations have some relationship with it, else it could yield imprecise estimates. Many regression techniques can be used for this. |
Revision as of 21:10, 23 November 2017
General Steps for treating Missing Data
- Identify the patterns/reasons for missing values correctly.
- Understand distribution of missing data, do they follow certain distribution?
- Decide on the best method of analysis and treat the values.
Here are some techniques to treat missing data
Deletion:
- If the nature of missing values is completely random and with enough data we can simply delete the data points with missing values.
Imputation
- Popular Averaging Techniques: Mean, median and mode are the most popular averaging techniques, which can be used to infer missing values and can be used to replace them.
- Predictive Techniques: imputation of missing values from predictive techniques assumes that the nature of such missing observations are not observed completely at random and the variables were chosen to impute such missing observations have some relationship with it, else it could yield imprecise estimates. Many regression techniques can be used for this.