Difference between revisions of "Models-TDSM"

From The Data Science Design Manual Wikia
Jump to: navigation, search
(Created page with "= Mathematical Models =")
 
Line 1: Line 1:
 
= Mathematical Models =
 
= Mathematical Models =
 +
 +
 +
'''Properties of Models'''
 +
 +
<br>7-1.
 +
Quantum physics is much more complicated than Newtonian physics. Which model passes the Occam's Razor test, and why?
 +
 +
[[TDSM 7.1|(Solution 7.1)]]
 +
 +
<br>7-3.
 +
Give examples of first-principle and data-driven models used in practice.
 +
 +
[[TDSM 7.3|(Solution 7.3)]]
 +
 +
<br>7-5.
 +
For one or more of the following <i>"The Quant Shop"</i> challenges, partition the full problem into subproblems that can be independently modeled:
 +
* <i>Miss Universe?</i>
 +
* <i>Movie gross?</i>
 +
* <i>Baby weight?</i>
 +
* <i>Art auction price?</i>
 +
* <i>Snow on Christmas?</i>
 +
* <i>Super Bowl / College Champion?</i>
 +
* <i>Ghoul Pool?</i>
 +
* <i>Future Gold / Oil Price?</i>
 +
 +
[[TDSM 7.5|(Solution 7.5)]]
 +
 +
 +
'''Evaluation Environments'''
 +
 +
<br>7-7.
 +
Explain what precision and recall are. How do they relate to the ROC curve?
 +
 +
[[TDSM 7.7|(Solution 7.7)]]
 +
 +
<br>7-9.
 +
Explain what overfitting is, and how you would control for it.
 +
 +
[[TDSM 7.9|(Solution 7.9)]]
 +
 +
<br>7-11.
 +
What is cross-validation? How might we pick the right value of <i>k</i> for <i>k</i>-fold cross validation?
 +
 +
[[TDSM 7.11|(Solution 7.11)]]
 +
 +
<br>7-13.
 +
Explain why we have training, test and validation data sets and how they are used effectively?
 +
 +
[[TDSM 7.13|(Solution 7.13)]]
 +
 +
<br>7-15.
 +
Propose baseline models for one or more of the following <i>"The Quant Shop"</i> challenges:
 +
* <i>Miss Universe?</i>
 +
* <i>Movie gross?</i>
 +
* <i>Baby weight?</i>
 +
* <i>Art auction price?</i>
 +
* <i>Snow on Christmas?</i>
 +
* <i>Super Bowl / College Champion?</i>
 +
* <i>Ghoul Pool?</i>
 +
* <i>Future Gold / Oil Price?</i>
 +
 +
[[TDSM 7.15|(Solution 7.15)]]
 +
 +
 +
'''Implementation Projects'''
 +
 +
<br>7-17.
 +
Build a general model evaluation system in your favorite programming language, and set it up with the right data to assess models for a particular problem. Your environment should report performance statistics, error distributions and/or confusion matrices as appropriate.
 +
 +
[[TDSM 7.17|(Solution 7.17)]]
 +
 +
 +
'''Interview Questions'''
 +
 +
<br>7-19.
 +
What do we mean when we talk about the bias-variance tradeoff?
 +
 +
[[TDSM 7.19|(Solution 7.19)]]
 +
 +
<br>7-21.
 +
Which is better: having good data or good models? And how do you define "good"?
 +
 +
[[TDSM 7.21|(Solution 7.21)]]
 +
 +
<br>7-23.
 +
How would you define and measure the predictive power of a metric?
 +
 +
[[TDSM 7.23|(Solution 7.23)]]
 +
 +
 +
'''Kaggle Challenges'''
 +
 +
<br>7-25.
 +
Who will win the NCAA basketball tournament?
 +
https://www.kaggle.com/c/march-machine-learning-mania-2016
 +
 +
[[TDSM 7.25|(Solution 7.25)]]

Revision as of 21:36, 31 March 2017

Mathematical Models

Properties of Models


7-1. Quantum physics is much more complicated than Newtonian physics. Which model passes the Occam's Razor test, and why?

(Solution 7.1)


7-3. Give examples of first-principle and data-driven models used in practice.

(Solution 7.3)


7-5. For one or more of the following "The Quant Shop" challenges, partition the full problem into subproblems that can be independently modeled:

  • Miss Universe?
  • Movie gross?
  • Baby weight?
  • Art auction price?
  • Snow on Christmas?
  • Super Bowl / College Champion?
  • Ghoul Pool?
  • Future Gold / Oil Price?

(Solution 7.5)


Evaluation Environments


7-7. Explain what precision and recall are. How do they relate to the ROC curve?

(Solution 7.7)


7-9. Explain what overfitting is, and how you would control for it.

(Solution 7.9)


7-11. What is cross-validation? How might we pick the right value of k for k-fold cross validation?

(Solution 7.11)


7-13. Explain why we have training, test and validation data sets and how they are used effectively?

(Solution 7.13)


7-15. Propose baseline models for one or more of the following "The Quant Shop" challenges:

  • Miss Universe?
  • Movie gross?
  • Baby weight?
  • Art auction price?
  • Snow on Christmas?
  • Super Bowl / College Champion?
  • Ghoul Pool?
  • Future Gold / Oil Price?

(Solution 7.15)


Implementation Projects


7-17. Build a general model evaluation system in your favorite programming language, and set it up with the right data to assess models for a particular problem. Your environment should report performance statistics, error distributions and/or confusion matrices as appropriate.

(Solution 7.17)


Interview Questions


7-19. What do we mean when we talk about the bias-variance tradeoff?

(Solution 7.19)


7-21. Which is better: having good data or good models? And how do you define "good"?

(Solution 7.21)


7-23. How would you define and measure the predictive power of a metric?

(Solution 7.23)


Kaggle Challenges


7-25. Who will win the NCAA basketball tournament? https://www.kaggle.com/c/march-machine-learning-mania-2016

(Solution 7.25)