TDSM 6.11

From The Data Science Design Manual Wikia
Jump to: navigation, search

6-11. Describe some good practices in data visualization?


  • Maximize Data-Ink Ratio
Focus on showing the data itself by reducing the amount of other graphics occupying your visualization. This can include things like removing effects such as shadows, flattening 3D graphs to 2D, and reducing/removing tic-marks and grids.
  • Minimize the Lie Factor
In many cases, it is possible to manipulate visualizations and statistics to create a misleading representation of data. While the "lie" portion of the phrase implies deceptive intent, techniques that increase the lie factor can still sneak their way into an honest scientist's visualizations. A few techniques you can employ to reduce the lie factor are always presenting the variance along with a mean, include actual data when presenting interpolations, try to scale your visualization to the Golden ratio (width approximately 1.6 times the height), include tick labels on numerical axes, and displaying the origin on your plot.
  • Minimize Chartjunk
Minimizing chartjunk is closely associated with maximizing the data-ink ratio. Chartjunk includes any visual elements that don't aid the viewer in receiving the message you are trying to communicate with your plot. This can include things like heavy or unnecessary grids, colored backgrounds, or the boundary lines of your plot. Essentially, you want to focus on keeping your plot concise, so that the data is the main attraction.
  • Proper Scaling and Labeling
Use labels that emphasize the appropriate magnitude of numbers and a scale that display data proportionately to the size of the plot. Generally, minimize white space in your plot and design your labels so that they are easy to read and correlate with the data.
  • Effective Use of Color and Shading
When dealing with plots including multiple classes, select colors that are typically affiliated with each class. Good examples of this are red for a negative class and green or blue for a positive class, yellow for bananas and orange for oranges, or using the color of a nation's flag. When representing a numerical scale, it is often best to stick with predefined color scales from your plotting library. As with dealing with multiple classes, choose a scale that is naturally affiliated with your data, such as low temperature starting at blue and gradually scaling to red at a high temperature.