# MATH 574M - Statistical Machine Learning and Data Mining

Announcements
• First class on 01/10.

Course Information
Lectures: Tue. and Thu. 9:30-10:45am, Bio. Sci. West 210| Syllabus
Office Hours: Tuesday 11-12pm, ENR2 S323. Or by appointment.
Textbooks: The Element of Statistical Learning:data miming, inference, and prediction Hastie, Tibshirani, and Friedman (2001).
Reference Books:
• Principle and Theory for Data Mining and Machine Learning by Clark, Forkoue, Zhang (2009)
• Pattern Recognition and Neural Networks by B. Ripley (1996)
• Learning with Kernels by Scholkopf and Smola (2000)
• The Nature of Statistical Learning Theory by Vapnik (1998)
• An overview of statistical learning theory, Vapnik (1999)

• Kernel Machines
• Hastie's Software and Data

R Resouces:
• Introduction to R | R for Beginners | R reference card

Statistics Prerequisites:
• Basic Topics | Joe Watkins' 363 Notes | Joe Watkins' MATH 464 Notes

 Course Activities Week 1 (Jan 9-13) Read Chapter 1: Overview of Data Mining Lecture 1: Introduction Get Familiar with Software: Intrudction to R R Brief Intro, R Guide For Reginners Supplementary Reading: Data mining and statistics: what is the connection? Friedman (1997) Homework 1. Assigned on Jan 15, due on Jan 29. Week 2-3 (Jan 14-27) Read Chapter 2: Theory of Supervised Learning Lecture 2: Statistical Decision Theory (I) Lecture 3: Statistical Decision Theory (II) Week 4 (Jan 28-Feb 3) Read Chapter 4.2-4.4: Linear Classificaton Methods for Binary Problems Lecture 4: Binary Classification (I): Basics Homework 2 Assignment. Assigned on Jan 29, due on Feb 12. Week 6 (Feb 4 - Feb 10) Supplementary Reading: Choosing Between Logistic Regression and Discriminant Analysis, Press, S. and Wilson, S. (1978) Lecture 5: Binary Classification (II): Logistic Regression and Discriminant Analysis Curse of Dimensionality; Linear Binary Classification for High Dimensional Problems Lecture 6: Binary Classification (III): Extension to High Dimensional Classification Problems Week 4 (Feb 11 - Feb 17) Read Chapter 4.1: Nonlinear Classification Methods Lecture 7: K nearest neighbor (Knn) methods Topic: Introduction to Multiclass Classifiction Lecture 8: Multiclass Classification Homework 3 Assignment.Assigned on Feb 12, due on Feb 26 Week 5 (Feb 18 - Feb 24) Topic: Nonlinear Discriminant Analysis Lecture 9: QDA and RDA Supplementary Reading: LDA for improved large vocabulary continuous speech recognition Lecture 10: PCA Week 6 (Feb 25 - March 3) Topic: Linear Regression Models Lecture 11: Linear Regression Read Chapter 3 : Linear Regression, Supplementary Reading: Linear Model Theory

Auditing
• Auditors are expected to attend class regularly and submit homework on the same schedule as the other students.