TDSM 6.13

From The Data Science Design Manual Wikia
Jump to: navigation, search

6-13. How would you to determine whether the statistics published in an article are either wrong or presented to support a biased view?


There are several techniques that can be used to present statistics incorrectly or to support a biased view. One thing to look out for are plot types that hide negative values or even makes them appear positive. Plotting a cumulative distribution function (cdf) rather than a probability density function (pdf) is a good example of this. While a pdf can display decreases, a cdf can only increase, which can give the illusion of a positive rate of change compared to previous points. To avoid this trap, familiarize yourself with many plot types and be aware of the characteristics of the plot you are viewing. Examine the data and statistics associated with the plot if they are available.

Another thing to look out for are improperly scaled and/or labeled plots. When viewing something like a line plot, changing the scale or label intervals can drastically alter the slope of the line, making it appear flatter or steeper in desired areas or across the entire plot. To spot this, carefully examine the labels in relation to the data rather than focusing on the slope of a line or facets of any other shape you may be viewing.

Beware of incomplete or misleading centrality measures. When presented with a mean, you should also be given the variance. Ask yourself, is the mean even an appropriate measure for the data being interpreted, or would the median be more appropriate? Are there outliers that aren't depicted that are skewing what's being presented? Have a strong understanding of these values and how and why they are being presented.

When given a correlation matrix or a similar plot, it is important to ask if the correlation actually implies some level of causation. A cleverly worded article or misleading plot can obscure the fact that the two classes being compared really may have a one-way relationship or none at all. Examine the logic being used to describe the relationship between the classes closely. The following website gives an entertaining view of this concept: http://www.tylervigen.com/spurious-correlations

Finally, be aware of the motivation behind the article. If the author belongs to a company that would gain the support of shareholders by presenting quarterly increases in sales, be critical and thoroughly analyze the design of any visualizations and statistics published. An author is less likely to present things that would negatively impact themselves or their association, so have an understanding of the relationship between the author and their audience.