Difference between revisions of "Models-TDSM"
(Created page with "= Mathematical Models =") |
|||
Line 1: | Line 1: | ||
= Mathematical Models = | = Mathematical Models = | ||
+ | |||
+ | |||
+ | '''Properties of Models''' | ||
+ | |||
+ | <br>7-1. | ||
+ | Quantum physics is much more complicated than Newtonian physics. Which model passes the Occam's Razor test, and why? | ||
+ | |||
+ | [[TDSM 7.1|(Solution 7.1)]] | ||
+ | |||
+ | <br>7-3. | ||
+ | Give examples of first-principle and data-driven models used in practice. | ||
+ | |||
+ | [[TDSM 7.3|(Solution 7.3)]] | ||
+ | |||
+ | <br>7-5. | ||
+ | For one or more of the following <i>"The Quant Shop"</i> challenges, partition the full problem into subproblems that can be independently modeled: | ||
+ | * <i>Miss Universe?</i> | ||
+ | * <i>Movie gross?</i> | ||
+ | * <i>Baby weight?</i> | ||
+ | * <i>Art auction price?</i> | ||
+ | * <i>Snow on Christmas?</i> | ||
+ | * <i>Super Bowl / College Champion?</i> | ||
+ | * <i>Ghoul Pool?</i> | ||
+ | * <i>Future Gold / Oil Price?</i> | ||
+ | |||
+ | [[TDSM 7.5|(Solution 7.5)]] | ||
+ | |||
+ | |||
+ | '''Evaluation Environments''' | ||
+ | |||
+ | <br>7-7. | ||
+ | Explain what precision and recall are. How do they relate to the ROC curve? | ||
+ | |||
+ | [[TDSM 7.7|(Solution 7.7)]] | ||
+ | |||
+ | <br>7-9. | ||
+ | Explain what overfitting is, and how you would control for it. | ||
+ | |||
+ | [[TDSM 7.9|(Solution 7.9)]] | ||
+ | |||
+ | <br>7-11. | ||
+ | What is cross-validation? How might we pick the right value of <i>k</i> for <i>k</i>-fold cross validation? | ||
+ | |||
+ | [[TDSM 7.11|(Solution 7.11)]] | ||
+ | |||
+ | <br>7-13. | ||
+ | Explain why we have training, test and validation data sets and how they are used effectively? | ||
+ | |||
+ | [[TDSM 7.13|(Solution 7.13)]] | ||
+ | |||
+ | <br>7-15. | ||
+ | Propose baseline models for one or more of the following <i>"The Quant Shop"</i> challenges: | ||
+ | * <i>Miss Universe?</i> | ||
+ | * <i>Movie gross?</i> | ||
+ | * <i>Baby weight?</i> | ||
+ | * <i>Art auction price?</i> | ||
+ | * <i>Snow on Christmas?</i> | ||
+ | * <i>Super Bowl / College Champion?</i> | ||
+ | * <i>Ghoul Pool?</i> | ||
+ | * <i>Future Gold / Oil Price?</i> | ||
+ | |||
+ | [[TDSM 7.15|(Solution 7.15)]] | ||
+ | |||
+ | |||
+ | '''Implementation Projects''' | ||
+ | |||
+ | <br>7-17. | ||
+ | Build a general model evaluation system in your favorite programming language, and set it up with the right data to assess models for a particular problem. Your environment should report performance statistics, error distributions and/or confusion matrices as appropriate. | ||
+ | |||
+ | [[TDSM 7.17|(Solution 7.17)]] | ||
+ | |||
+ | |||
+ | '''Interview Questions''' | ||
+ | |||
+ | <br>7-19. | ||
+ | What do we mean when we talk about the bias-variance tradeoff? | ||
+ | |||
+ | [[TDSM 7.19|(Solution 7.19)]] | ||
+ | |||
+ | <br>7-21. | ||
+ | Which is better: having good data or good models? And how do you define "good"? | ||
+ | |||
+ | [[TDSM 7.21|(Solution 7.21)]] | ||
+ | |||
+ | <br>7-23. | ||
+ | How would you define and measure the predictive power of a metric? | ||
+ | |||
+ | [[TDSM 7.23|(Solution 7.23)]] | ||
+ | |||
+ | |||
+ | '''Kaggle Challenges''' | ||
+ | |||
+ | <br>7-25. | ||
+ | Who will win the NCAA basketball tournament? | ||
+ | https://www.kaggle.com/c/march-machine-learning-mania-2016 | ||
+ | |||
+ | [[TDSM 7.25|(Solution 7.25)]] |
Revision as of 21:36, 31 March 2017
Mathematical Models
Properties of Models
7-1.
Quantum physics is much more complicated than Newtonian physics. Which model passes the Occam's Razor test, and why?
7-3.
Give examples of first-principle and data-driven models used in practice.
7-5.
For one or more of the following "The Quant Shop" challenges, partition the full problem into subproblems that can be independently modeled:
- Miss Universe?
- Movie gross?
- Baby weight?
- Art auction price?
- Snow on Christmas?
- Super Bowl / College Champion?
- Ghoul Pool?
- Future Gold / Oil Price?
Evaluation Environments
7-7.
Explain what precision and recall are. How do they relate to the ROC curve?
7-9.
Explain what overfitting is, and how you would control for it.
7-11.
What is cross-validation? How might we pick the right value of k for k-fold cross validation?
7-13.
Explain why we have training, test and validation data sets and how they are used effectively?
7-15.
Propose baseline models for one or more of the following "The Quant Shop" challenges:
- Miss Universe?
- Movie gross?
- Baby weight?
- Art auction price?
- Snow on Christmas?
- Super Bowl / College Champion?
- Ghoul Pool?
- Future Gold / Oil Price?
Implementation Projects
7-17.
Build a general model evaluation system in your favorite programming language, and set it up with the right data to assess models for a particular problem. Your environment should report performance statistics, error distributions and/or confusion matrices as appropriate.
Interview Questions
7-19.
What do we mean when we talk about the bias-variance tradeoff?
7-21.
Which is better: having good data or good models? And how do you define "good"?
7-23.
How would you define and measure the predictive power of a metric?
Kaggle Challenges
7-25.
Who will win the NCAA basketball tournament?
https://www.kaggle.com/c/march-machine-learning-mania-2016