Regression-TDSM

From The Data Science Design Manual Wikia
Jump to: navigation, search

Linear and Logistic Regression

Linear Regression


9-1. Construct an example on [math]n \geq 6[/math] points where the optimal regression line is [math]y=x[/math], even though none of the input points lie directly on this line.

(Solution 9.1)


9-3. Suppose we want to find the best fitting function [math]y=f(x)[/math] where [math]y=w^2 x + w x[/math]. How can we use linear regression to find the best value of w?

(Solution 9.3)


9-5. Explain what a long-tailed distribution is, and provide three examples of relevant phenomena that have long tails. Why are they important in classification and regression problems?

(Solution 9.5)


9-7. Establish the effect that different values for the constant c of the logit function have on the probability of classification being 0.01, 1, 2, and 10 units from the boundary.

(Solution 9.7)


Experiments with Linear Regression


9-9. Experiment with the effects of feature scaling in linear regression. For a given dataset with at least two features (dimensions), multiply all the values of one feature by [math]10^k[/math], for [math]-10 \leq k \leq 10[/math]. Does this operation cause a loss of numerical accuracy in fitting?

(Solution 9.9)


9-11. Experiment with the effects of outliers on linear regression. For a given [math](x,y)[/math] dataset, construct the best fitting line. Repeatedly delete the point with the largest residual, and refit. Is the sequence of predicted slopes relatively stable for much of this process?

(Solution 9.11)


Implementation Projects


9-13. Use linear/logistic regression to build a model for one of the "The Quant Shop" challenges:

  • Miss Universe?
  • Movie gross?
  • Baby weight?
  • Art auction price?
  • Snow on Christmas?
  • Super Bowl / College Champion?
  • Ghoul Pool?
  • Future Gold / Oil Price?
  • (Solution 9.13)


    Interview Questions


    9-15. Suppose we are training a model using stochastic gradient descent. How do we know if we are converging to a solution?

    (Solution 9.15)


    9-17. What assumptions are required for linear regression? What if some of these assumptions are violated?

    (Solution 9.17)


    Kaggle Challenges


    9-19. Identify what is being cooked, given the list of ingredients. https://www.kaggle.com/c/whats-cooking

    (Solution 9.19)


    9-21. What does a worker need access to in order to do their job? https://www.kaggle.com/c/amazon-employee-access-challenge

    (Solution 9.21)