Difference between revisions of "Scores-rankings-TDSM"

From The Data Science Design Manual Wikia
Jump to: navigation, search
Line 14: Line 14:
 
[[TDSM 4.3|(Solution 4.3)]]
 
[[TDSM 4.3|(Solution 4.3)]]
  
4-5. Find a dataset on properties of one of the following classes of things:
+
<br>4-5.
 +
Find a dataset on properties of one of the following classes of things:
 
<ol type="a">
 
<ol type="a">
 
<li>The countries of the world.</li>
 
<li>The countries of the world.</li>

Revision as of 20:59, 31 March 2017

Scores and Rankings

Scores and Rankings


4-1. Let X represent a random variable drawn from the normal distribution defined by [math]\mu = 2[/math] and [math]\sigma = 3[/math]. Suppose we observe [math]X = 5.08[/math]. Find the Z-score of x, and determine how many standard deviations away from the mean that x is.

(Solution 4.1)


4-3. Amanda took the Graduate Record Examination (GRE), and scored 160 in verbal reasoning and 157 in quantitative reasoning. The mean score for verbal reasoning was 151 with a standard deviation of 7, compared with mean [math]\mu = 153[/math] and [math]\sigma = 7.67[/math] for quantitative reasoning. Assume that both distributions are normal.

(Solution 4.3)


4-5. Find a dataset on properties of one of the following classes of things:

  1. The countries of the world.
  2. Movies and movie stars.
  3. Sports stars.
  4. Universities.

Construct a sensible ranking function reflecting quality or popularity. How well is this correlated with some external measure aiming at a similar result?

(Solution 4.5)


4-7. The scoring systems used by professional sports leagues to select the most valuable player award winner typically involves assigning positional weights to permutations specified by voters. What systems do they use in professional baseball, basketball, and football? Are they similar? Do you think they are sensible?

(Solution 4.7)


Implementation Projects


4-9. Evaluate the robustness of Borda's method by applying k random swaps to each of m distinct copies of the permutation [math]p = \{1,2,\ldots,n\}[/math]. What is the threshold where Borda's method fails to reconstruct p, as a function of n, k, and m?

(Solution 4.9)


Interview Questions


4-11. How can you test whether a new credit risk scoring model works?

(Solution 4.11)


Kaggle Challenges


4-13. Rating chess players from game positions. https://www.kaggle.com/c/chess

(Solution 4.13)


4-15. Predict the salary of a job from its ad. https://www.kaggle.com/c/job-salary-prediction

(Solution 4.15)