Models-TDSM

From The Data Science Design Manual Wikia
Revision as of 22:22, 31 March 2017 by Admin (talk | contribs) (Protected "Models-TDSM" ([Edit=Allow only administrators] (indefinite) [Move=Allow only administrators] (indefinite)))
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Mathematical Models

Properties of Models


7-1. Quantum physics is much more complicated than Newtonian physics. Which model passes the Occam's Razor test, and why?

(Solution 7.1)


7-3. Give examples of first-principle and data-driven models used in practice.

(Solution 7.3)


7-5. For one or more of the following "The Quant Shop" challenges, partition the full problem into subproblems that can be independently modeled:

  • Miss Universe?
  • Movie gross?
  • Baby weight?
  • Art auction price?
  • Snow on Christmas?
  • Super Bowl / College Champion?
  • Ghoul Pool?
  • Future Gold / Oil Price?

(Solution 7.5)


Evaluation Environments


7-7. Explain what precision and recall are. How do they relate to the ROC curve?

(Solution 7.7)


7-9. Explain what overfitting is, and how you would control for it.

(Solution 7.9)


7-11. What is cross-validation? How might we pick the right value of k for k-fold cross validation?

(Solution 7.11)


7-13. Explain why we have training, test and validation data sets and how they are used effectively?

(Solution 7.13)


7-15. Propose baseline models for one or more of the following "The Quant Shop" challenges:

  • Miss Universe?
  • Movie gross?
  • Baby weight?
  • Art auction price?
  • Snow on Christmas?
  • Super Bowl / College Champion?
  • Ghoul Pool?
  • Future Gold / Oil Price?

(Solution 7.15)


Implementation Projects


7-17. Build a general model evaluation system in your favorite programming language, and set it up with the right data to assess models for a particular problem. Your environment should report performance statistics, error distributions and/or confusion matrices as appropriate.

(Solution 7.17)


Interview Questions


7-19. What do we mean when we talk about the bias-variance tradeoff?

(Solution 7.19)


7-21. Which is better: having good data or good models? And how do you define "good"?

(Solution 7.21)


7-23. How would you define and measure the predictive power of a metric?

(Solution 7.23)


Kaggle Challenges


7-25. Who will win the NCAA basketball tournament? https://www.kaggle.com/c/march-machine-learning-mania-2016

(Solution 7.25)