Visualizing-data-TDSM

From The Data Science Design Manual Wikia
Jump to: navigation, search

Visualizing Data

Exploratory Data Analysis


6-1. Provide answers to the questions associated with the following data sets, available at http://www.data-manual.com/data:

  1. Analyze the movie dataset. What is the range of movie gross in the U.S.? Which type of movies are most likely to succeed in the market? Comedy? PG-13? Drama?
  2. Analyze the Manhattan rolling sales dataset. Where in Manhattan is the most/least expensive real estate located? What is the relationship between sales price and gross square feet?
  3. Analyze the 2012 Olympic dataset. What can you say about the relationship between a country's population and the number of medals it wins? What can you say about the relationship between the ratio of female and male counts and the GDP of that country?
  4. Analyze the GDP per capita dataset. How do countries from Europe, Asia, and Africa compare in the rates of growth in GDP? When have countries faces substantial changes in GDP, and what historical events were likely most responsible for it?

(Solution 6.1)


Interpreting Visualizations


6-3. Search your favorite news websites until you find ten interesting charts/plots, ideally half good and half bad. For each, please critique along the following dimensions, using the vocabulary we have developed in this chapter:

  1. Does it do a good job or a bad job presenting the data?
  2. Does the presentation appear to be biased, either deliberately or accidentally?
  3. Is there chartjunk in the figure?
  4. Are the axes labeled in a clear and informative way?
  5. Is the color used effectively?
  6. How can we make the graphic better?

(Solution 6.3)


Creating Visualizations


6-5. Construct a revealing visualization of some aspect of your favorite data set, using:

  1. A well-designed table.
  2. A dot and/or line plot.
  3. A scatter plot.
  4. A heatmap.
  5. A bar plot or pie chart.
  6. A histogram.
  7. A data map.

(Solution 6.5)


6-7. Construct scatter plots for sets of 10, 100, 1000, and 10,000 points. Experiment with the point size to find the most revealing value for each data set.

(Solution 6.7)


Implementation Projects


6-9. Build an interactive exploration widget for your favorite dataset, using appropriate libraries and tools. Start simple, but be as creative as you want to be.

(Solution 6.9)


Interview Questions


6-11. Describe some good practices in data visualization?

(Solution 6.11)


6-13. How would you determine whether the statistics published in an article are either wrong or presented to support a biased view?

(Solution 6.13)


Kaggle Challenges


6-15. Predict whether West Nile virus is present in a given time and place. https://www.kaggle.com/c/predict-west-nile-virus

(Solution 6.15)