Difference between revisions of "Mathematical-preliminaries-TDSM"

From The Data Science Design Manual Wikia
Jump to: navigation, search
(Created page with "= Mathematical Preliminaries = '''Probability''' <br>2-1. Suppose 80% of people like peanut butter, 89% like jelly, and 78% like both. Given that a randomly sampled person...")
(No difference)

Revision as of 20:45, 31 March 2017

Mathematical Preliminaries

Probability


2-1. Suppose 80% of people like peanut butter, 89% like jelly, and 78% like both. Given that a randomly sampled person likes peanut butter, what is the probability that she also likes jelly?

(Solution 2.1)


2-3. Consider a game where your score is the maximum value from two dice. Compute the probability of each event from [math]\{1, \ldots, 6\}[/math]

(Solution 2.3)


2-5. If two binary random variables X and Y are independent, is [math]\bar{X}[/math] (the complement of X) and Y also independent? Give a proof or a counterexample.

(Solution 2.5)


Statistics


2-7. Construct a probability distribution where none of the mass lies within one [math]\sigma[/math] of the mean.

(Solution 2.7)


2-9. Show that the arithmetic mean equals the geometric mean when all terms are the same.

(Solution 2.9)


Correlation Analysis


2-11. What would be the correlation coefficient between the annual salaries of college and high school graduates at a given company, if for each possible job title the college graduates always made:

  1. 5,000 dollars more than high school grads?
  2. 25% more than high school grads?
  3. 15% less than high school grads?

(Solution 2.11)


2-13. Use data or literature found in a Google search to estimate/measure the strength of the correlation between:

  1. Hits and walks scored for hitters in baseball.
  2. Hits and walks allowed by pitchers in baseball.

(Solution 2.13)


Logarithms


2-15. Show that the logarithm of any number less than 1 is negative.

(Solution 2.15)


2-17. Prove that [math]x \cdot y = b^{(\log_b x + \log_b y)}[/math]

(Solution 2.17)


Implementation Projects


2-19. Find some interesting data sets, and compare how similar their means and medians are? What are the distributions where the mean/median differ on the most?

(Solution 2.19)


Interview Questions


2-21. What is the probability of getting exactly k heads on n tosses, where the coin has a probability of p in coming up heads on each toss? What about k or more heads?

(Solution 2.21)


2-23. At halftime of a basketball game you are offered two possible challenges:

  1. Take three shots, and make at least two of them.
  2. Take eight shots, and make at least five of them.

(Solution 2.23)


2-25. Given a stream of n numbers, show how to select one uniformly at random using only constant storage. What if you don't know n in advance?

(Solution 2.25)


2-27. A person randomly types a 8 digit number in a pocket calculator. What is the probability that the number looks the same even if the calculator is turned upside down.

(Solution 2.27)


2-29. What is A/B testing and how does it work?

(Solution 2.29)


2-31. We often say that correlation does not imply causation. What does this mean?

(Solution 2.31)


Kaggle Challenges


2-33. Cause-effect pairs: correlation vs causation. https://www.kaggle.com/c/cause-effect-pairs

(Solution 2.33)


2-35. Predict the fate of animals at a pet shelter https://www.kaggle.com/c/shelter-animal-outcomes

(Solution 2.35)