Regression-TDSM
Linear and Logistic Regression
Linear Regression
9-1.
Construct an example on [math]n \geq 6[/math] points where the optimal regression line is [math]y=x[/math], even though none of the input points lie directly on this line.
9-3.
Suppose we want to find the best fitting function [math]y=f(x)[/math] where [math]y=w^2 x + w x[/math]. How can we use linear regression to find the best value of w?
9-5.
Explain what a long-tailed distribution is, and provide three examples of relevant phenomena that have long tails. Why are they important in classification and regression problems?
9-7.
Establish the effect that different values for the constant c of the logit function have on the probability of classification being 0.01, 1, 2, and 10 units from the boundary.
Experiments with Linear Regression
9-9.
Experiment with the effects of feature scaling in linear regression. For a given dataset with at least two features (dimensions), multiply all the values of one feature by [math]10^k[/math], for [math]-10 \leq k \leq 10[/math]. Does this operation cause a loss of numerical accuracy in fitting?
9-11.
Experiment with the effects of outliers on linear regression. For a given [math](x,y)[/math] dataset, construct the best fitting line. Repeatedly delete the point with the largest residual, and refit. Is the sequence of predicted slopes relatively stable for much of this process?
Implementation Projects
9-13.
Use linear/logistic regression to build a model for one of the "The Quant Shop" challenges:
Interview Questions
9-15.
Suppose we are training a model using stochastic gradient descent. How do we know if we are converging to a solution?
9-17.
What assumptions are required for linear regression? What if some of these assumptions are violated?
Kaggle Challenges
9-19.
Identify what is being cooked, given the list of ingredients.
https://www.kaggle.com/c/whats-cooking
9-21.
What does a worker need access to in order to do their job?
https://www.kaggle.com/c/amazon-employee-access-challenge