HW #1:
Due: Monday, April 12, 2010 at 5:00pm in EEE dropbox
Reading:
- Please read all the details on the course web page, including the
project ideas page, and begin thinking about the
project ideas that interest you.
- Please read a review of basic concepts in probability.
- Please go through the Matlab mini-tutorial and at least one other introductory tutorial online.
- Go through Lecture Slides 1 and
Slides 2 from the past week -- while
we covered the majority of topics, we may have skipped a few slides (but they are self-explanatory).
- Read two student project papers of your choice from the Stanford Machine
Learning course. In your report below, briefly summarize each paper and answer the following questions: What is the task addressed
in the paper, and why is it important to use machine learning to solve this task? What machine learning algorithms were used?
Did the results seem convincing to you? In what ways can the student improve the project?
Matlab coding:
- Using any data set of your choice from the UCI Machine Learning Repository,
perform exploratory data analysis (EDA) and produce 5 interesting plots. For instance, you may use boxplots, histograms,
scatter plots, and other types of visualizations. In your report, please describe why each visualization is interesting.
- Code classifiers based on k-Nearest-Neighbors and logistic regression, and apply these classifiers to the Iris data.
Please use this Matlab code skeleton -- more information about the project is described in these code files.
- For 10% extra credit, apply the classifiers to the Pima Indians
Diabetes data set. You may have to revise your classifiers to deal with missing data (more information in code skeleton above).
What to turn in (zip these files up into a file "HW1.zip" and submit the zip file to EEE):
- Report ("Report.pdf"): This report should include the summary of the two student papers above, the 5 EDA plots (with a brief discussion per
plot), and the results of the k-NN and logistic regression classifiers (see the code skeleton above for a list of plots to report).
- Matlab scripts ("knn.m", "logistic_regression.m", "evaluate.m", and "scriptIRIS.m"): You would need to fill in the code
for each of these scripts. See code skeleton above for more information.
This assignment is to be done individually. Please note that the HW
will not only be graded based on the
correctness of the solution, but also on other factors, such as
whether or not you provided sufficient comments in the code and
whether or not you performed error-checking in the code. We will
also look at the thoroughness of your written report.
For any questions or clarifications, please feel free to use the EEE course
message board.
Update: HW1 solutions and rubric
|