CompSci 273P: Machine Learning and Data Mining, Spring 2018 |

Course Outline |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

- When: Tuesday & Thursday, 5:00 - 6:20p
- Where: SSL 228 UCI campus map
- Course Code: 35515
- Lab/Discussion section : Tue 7:00-7:50 SSL 270.
- Instructor: Kalev Kask
- Email: kkask@uci.edu; when sending email, put CS273P in the subject line
- Office hours: TBD
- TA: Filjor Broka
- Email: fbroka@uci.edu
- Reader: Ananthakrishnan Pushpendran
- Email: apushpen@uci.edu
Course Overview:
How can a machine learn from experience and become better at a given task? How can we automatically extract knowledge or make sense of massive quantities of data? These are the fundamental questions of machine learning. Machine learning and data mining algorithms use techniques from statistics, optimization, and computer science to create automated systems which can shift through large volumes of data at high speed to make predictions or decisions without human intervention. This class will familiarize you with a broad cross-section of basic/popular models and algorithms for machine learning, and prepare you for industry application of machine learning techniques. Background:
We will assume basic familiarity with the concepts of probability, statistics, calculus and linear algebra. Some programming will be required; we will primarily use Python, using the libraries "numpy" and "matplotlib", as well as course code.. Assignments:
There will be a few homework-assignments (one the average one hw every two weeks), two projects, and a final. Course-Grade:
- Homeworks 20%
- 2 projects, 20% each
- Final 40%
Projects:
You will be required to finish 2 projects : - Project 1 is regression; due approx week 9
- Project 2 is classification; due approx week 11
- Project consists of a team of 3 students working together
- Each team will submit results to Kaggle competion
- Further details TBA
Textbook and Reading:
There is no required textbook for the class. However, useful books on the subject for supplementary reading include : - Duda, Hart, Stork, "Pattern Classification"
- Daume "A Course in Machine Learning"
- Hastie, Tibshirani, Friedman, "The Elements of Statistical Learning"
- Murphy "Machine Learning: A Probabilistic Perspective"
- Bishop "Pattern Recognition and Machine Learning"
- Sutton "Reinforcement Learning"
Python:
While you can use any environment/language/platform for computer coding assignments, we recommend and support Python. I strongly suggest the "full SciPy stack", which includes NumPy, MatPlotLib, SciPy, and iPython notebook for interactive work and visualization; see HERE for installation information. Here is a simple introduction to numpy and plotting for the course; and of course you can find complete documentation for these libraries as well as many more tutorial guides online. While Python 2.7 is still widely used, try to program in a 3.0 compatible way; if you find parts of the code do not work for more recent versions of Python please let us know the issue and we will try to fix it. Lab and Discussion:
There is a lab/discussion section on Tuesdays 7:00pm, shortly after class, in SSL 270. This is where you can discuss course material, get help with programming (Python) and discuss project related issues/questions. We will use a course Piazza page for questions and discussion. Please post your questions there; you can post privately if you prefer, or if (for example) your question needs to reveal your solution to a homework problem. I prefer to use Piazza for all class contact, since it enables responses by either myself, the TA, or fellow students (if public), which should get you answers more quickly. Note: when posting privately, please post to "Instructors" (which includes the instructor & TAs). Syllabus:
Subject to changes
Online Notes: |