Difference between revisions of "Machine--learning-TDSM"
(Created page with "= Machine Learning = '''Classification''' <br>11-1. Using the Naive Bayes classifier of Figure 1.2, decide whether (Cloudy,High, Normal) and (Sunny,Low,High) are beach days...") |
|||
Line 50: | Line 50: | ||
− | Implementation Projects''' | + | '''Implementation Projects''' |
<br>11-13. | <br>11-13. |
Revision as of 22:12, 31 March 2017
Machine Learning
Classification
11-1.
Using the Naive Bayes classifier of Figure 1.2, decide whether (Cloudy,High, Normal) and (Sunny,Low,High) are beach days.
11-3.
What is regularization, and what kind of problems with machine learning does it solve?
Decision Trees
11-5.
Suppose we are given an \\(n \times d\\) labeled classification data matrix, where each item has an associated label class A or class B. Give a proof or a counterexample to each of the statements below:
- Does there always exist a decision tree classifier which perfectly separates A from B?
- Does there always exist a decision tree classifier which perfectly separates A from B if the n feature vectors are all distinct?
- Does there always exist a logistic regression classifier which perfectly separates A from B?
- Does there always exist a logistic regression classifier which perfectly separates A from B if the n feature vectors are all distinct?
Support Vector Machines
11-7.
Give a linear-time algorithm to find the maximum-width separating line in one dimension.
11-9.
Suppose we use support vector machines to find a perfect separating line between a given set of n red and blue points. Now suppose we delete all the points which are not support vectors, and use SVM to find the best separator of what remains. Might this separating line be different than the one before?
Neural Networks
11-11.
Specify the network structure and node activation functions to enable a neural network model to implement logistic regression.
Implementation Projects
11-13.
Experiment with different discounting methods estimating the frequency of words in English. In particular, evaluate the degree to which frequencies on short text files (1000 words, 10,000 words, 100,000 words, and 1,000,000 words) reflect the frequencies over a large text corpora, say, 10,000,000 words.
Interview Questions
11-15.
What is deep learning? What are some of the characteristics that distinguish it from traditional machine learning
11-17.
How would you come up with a program to identify plagiarism in documents?
Kaggle Challenges
11-19.
Did a movie reviewer like or dislike the film?
https://www.kaggle.com/c/sentiment-analysis-on-movie-reviews