TDSM 2.11

From The Data Science Design Manual Wikia
Jump to: navigation, search

Let [math]X[/math] be the annual salaries of high school graduates

[math]Y[/math] be the annual salaries of college graduates

[math]n[/math] be the number of job positions

a) For each possible job title, the college graduates always made 5,000 dollars more than high school grads

[math]\Rightarrow \bar{Y} = \bar{X} + 5000[/math] and [math] \forall i (1 \leq i \leq n): Y_i = X_i + 5000 [/math]

Correlation coefficient of [math]X[/math] and [math]Y[/math]:

[math] \tau = \frac{\sum_{i = 1}^{n}(X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum_{i = 1}^{n}(X_i - \bar{X})^2} \sqrt{\sum_{i = 1}^{n}(Y_i - \bar{Y})^2}} = \frac{\sum_{i = 1}^{n}(X_i - \bar{X})(X_i + 5000 - (\bar{X} + 5000))}{\sqrt{\sum_{i = 1}^{n}(X_i - \bar{X})^2} \sqrt{\sum_{i = 1}^{n}(X_i + 5000 - (\bar{X} + 5000))^2}} = \frac{\sum_{i = 1}^{n}(X_i - \bar{X})^2} {\left( \sqrt{\sum_{i = 1}^{n}(X_i - \bar{X})^2} \right)^2} = 1 [/math]

b) For each possible job title, the college graduates always made 25% more than high school grads

[math]\Rightarrow \bar{Y} = 1.25 \bar{X}[/math] and [math] \forall i (1 \leq i \leq n): Y_i = 1.25 X_i [/math]

Correlation coefficient of [math]X[/math] and [math]Y[/math]:

[math] \tau = \frac{\sum_{i = 1}^{n}(X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum_{i = 1}^{n}(X_i - \bar{X})^2} \sqrt{\sum_{i = 1}^{n}(Y_i - \bar{Y})^2}} = \frac{\sum_{i = 1}^{n}(X_i - \bar{X}) \cdot 1.25 (X_i - \bar{X})}{\sqrt{\sum_{i = 1}^{n}(X_i - \bar{X})^2} \sqrt{ \sum_{i = 1}^{n}1.25^2(X_i - \bar{X})^2}} = \frac{1.25 \sum_{i = 1}^{n}(X_i - \bar{X})^2} {1.25 \left( \sqrt{\sum_{i = 1}^{n}(X_i - \bar{X})^2} \right)^2} = 1 [/math]

c) For each possible job title, the college graduates always made 15% less than high school grads


[math]\Rightarrow \bar{Y} = 0.85 \bar{X}[/math] and [math] \forall i (1 \leq i \leq n): Y_i = 0.85 X_i [/math]

Correlation coefficient of [math]X[/math] and [math]Y[/math]:

[math] \tau = \frac{\sum_{i = 1}^{n}(X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum_{i = 1}^{n}(X_i - \bar{X})^2} \sqrt{\sum_{i = 1}^{n}(Y_i - \bar{Y})^2}} = \frac{\sum_{i = 1}^{n}(X_i - \bar{X}) \cdot 0.85 (X_i - \bar{X})}{\sqrt{\sum_{i = 1}^{n}(X_i - \bar{X})^2} \sqrt{ \sum_{i = 1}^{n}0.85^2(X_i - \bar{X})^2}} = \frac{0.85 \sum_{i = 1}^{n}(X_i - \bar{X})^2} {0.85 \left( \sqrt{\sum_{i = 1}^{n}(X_i - \bar{X})^2} \right)^2} = 1 [/math]