Statistical-analysis-TDSM

From The Data Science Design Manual Wikia
Jump to: navigation, search

Statistical Analysis

Statistical Distributions


5-1. Explain which distribution seems most appropriate for the following phenomenon: binomial, normal, Poisson, or power law?

  1. The number of leaves on a fully grown oak tree.
  2. The age at which people's hair turns grey.
  3. The number of hairs on the heads of 20-year olds.
  4. The number of people who have been hit by lightning x times.
  5. The number of miles driven before your car needs a new transmission.
  6. The number of runs a batter will get, per cricket over.
  7. The number of leopard spots per square foot of leopard skin.
  8. The number of people with exactly x pennies sitting in drawers.
  9. The number of apps on people's cell phones.
  10. The daily attendance in Skiena's data science course.

(Solution 5.1)


5-3. Assuming that the relevant distribution is normal, estimate the probability of the following events:

  1. That there will be 70 or more heads in the next 100 flips of a fair coin?
  2. That a randomly selected person will weight over 300 lbs?

(Solution 5.3)


5-5. Facebook data shows that 50% of Facebook users have 100 or more friends. Further, the average user's friend count is 190. What do these findings say about the shape of the distribution of number of friends of Facebook users?

(Solution 5.5)


Significance Testing


5-7. The 2010 American Community Survey estimates that 47.1% of women ages 15 years and over are married.

  1. Randomly select three women between these ages. What is the probability that the third woman selected is the only one that is married?
  2. What is the probability that all three women are married?
  3. On average, how many women would you expect to sample before selecting a married woman? What is the standard deviation?
  4. If the proportion of married women was actually 30%, how many women would you expect to sample before selecting a married woman? What is the standard deviation?
  5. Based on your answers to parts (c) and (d), how does decreasing the probability of an event affect the mean and standard deviation of the wait time until success?

(Solution 5.7)


Permutation Tests and P-values


5-9. Obtain data on the heights of m men and w women.

  1. Use a t-test to establish the significance of whether the men are on average taller than the women.
  2. Perform a permutation test to establish the same thing: whether the men are on average taller than the women.

(Solution 5.9)


Implementation Projects


5-11. February 2 is Groundhog Day in the United States, when it is said that six more weeks of winter follows if the groundhog sees its shadow. Taking whether it is sunny on 2/2 as a proxy for the groundhog's input, is there any predictive power to this tradition? Do a study based on weather records, and report the accuracy of the beast's forecasts along with its statistical significance.

(Solution 5.11)


Interview Questions


5-13. What is Bayes’ Theorem? And why is it useful in practice?

(Solution 5.13)


5-15. A coin is tossed 10 times and the results are 2 tails and 8 heads. How can you tell whether the coin is fair? What is the p-value for this result?

(Solution 5.15)


5-17. An ant is placed on an infinitely long twig. The ant can move one step backward or one step forward with same probability, during discrete time steps. What is the probability that the ant will return to its starting point after 2n steps?

(Solution 5.17)


Kaggle Challenges


5-19. Decide whether a car bought at an auction is a bad buy. https://www.kaggle.com/c/DontGetKicked

(Solution 5.19)


5-21. How much rain we will get in the next hour? https://www.kaggle.com/c/how-much-did-it-rain

(Solution 5.21)